I have a dataset which contains email, city, state, zip and date. What i need is one row for each emails and city, state and zip be filled with the latest non-null value available for each.
Input:
Output:
I am using a query like below but it is taking hours to run. Is there any other effecient way to get the desired output in SQL?
row_number()over(partition by Email_Addr order by email_effective_from desc) as rn1
into #d1 from data where zip is not null and email_addr is not null;
select Email_Addr,city,
row_number()over(partition by Email_Addr order by email_effective_from desc) as rn2 into #d2
from data where city is not null and email_addr is not null;
select Email_Addr,[state],
row_number()over(partition by Email_Addr order by email_effective_from desc) as rn3 into #d3
from data where state is not null and email_addr is not null;
select a.email_addr,a.zip,b.city,c.[state] into #dff from #d1 a
full outer join #d2 b on a.email_addr=b.email_addr
full outer join #d3 c on a.email_addr=c.email_addr```
2条答案
按热度按时间vdgimpew1#
If you are running SQL Server 2022, one option uses
last_value
andignore nulls
:fiddle
Or we can use
with ties
instead of filtering:In earlier versions, one alternative uses a gaps-and-islands technique to build groups of rows, then aggregates over those groups:
Demo on DB Fiddle
bxjv4tth2#
You can use something like this:
We construct strings by appending them to date, and at the same time convert '' to NULL, this simplifies our aggregation.
Then after performing MAX aggregation, we deconstruct the strings by removing the date part, this leaves the real last value per date.