SQL Server Taking latest non null record from multiple columns and bringing down to one row in SQL without iterating/looping

dsf9zpds  于 2023-08-02  发布在  其他
关注(0)|答案(2)|浏览(80)

I have a dataset which contains email, city, state, zip and date. What i need is one row for each emails and city, state and zip be filled with the latest non-null value available for each.

Input:

Output:

I am using a query like below but it is taking hours to run. Is there any other effecient way to get the desired output in SQL?

row_number()over(partition by Email_Addr order by email_effective_from desc) as rn1
into #d1 from data where zip is not null and email_addr is not null;
select Email_Addr,city,
row_number()over(partition by Email_Addr order by email_effective_from desc) as rn2 into #d2
from data where city is not null and email_addr is not null;
select Email_Addr,[state],
row_number()over(partition by Email_Addr order by email_effective_from desc) as rn3 into #d3
from data  where state is not null and email_addr is not null;
select a.email_addr,a.zip,b.city,c.[state] into #dff from #d1 a
full outer join #d2 b on a.email_addr=b.email_addr
full outer join #d3 c on a.email_addr=c.email_addr```
vdgimpew

vdgimpew1#

If you are running SQL Server 2022, one option uses last_value and ignore nulls :

select *
from (
    select email, date,
        last_value(city)  ignore nulls over(partition by email order by date) city,
        last_value(state) ignore nulls over(partition by email order by date) state,
        last_value(zip)   ignore nulls over(partition by email order by date) zip,
        row_number()                   over(partition by email order by date desc) rn
    from mytable t
) t
where rn = 1
emaildatecitystateziprn
abc2023-01-04BJP1600071

fiddle

Or we can use with ties instead of filtering:

select top (1) with ties email, date,
    last_value(city)  ignore nulls over(partition by email order by date) city,
    last_value(state) ignore nulls over(partition by email order by date) state,
    last_value(zip)   ignore nulls over(partition by email order by date) zip,
    row_number() over(partition by email order by date desc) rn
from mytable t
order by row_number() over(partition by email order by date desc)

In earlier versions, one alternative uses a gaps-and-islands technique to build groups of rows, then aggregates over those groups:

select top (1) with ties email, date, 
    max(city)  over(partition by email, grp_city ) city,
    max(state) over(partition by email, grp_state) state,
    max(zip)   over(partition by email, grp_zip  ) zip
from (
    select t.*,
        count(city)  over(partition by email order by date) grp_city,
        count(state) over(partition by email order by date) grp_state,
        count(zip)   over(partition by email order by date) grp_zip
    from mytable t 
) t
order by row_number() over(partition by email order by date desc)

Demo on DB Fiddle

bxjv4tth

bxjv4tth2#

You can use something like this:

declare @t table  (email varchar(100), city nvarchar(100),state nvarchar(100), zip nvarchar(10), date datetime)

insert into @t
values  ('a@b.com', 'A', '', '', '20210101')
,   ('a@b.com', '', 'MN', '160007', '20210102')
,   ('a@b.com', 'B', '', '', '20210103')
,   ('a@b.com', '', 'JP', '', '20210104')
,   ('a@a.com', 'Z', 'A', 'B', '20210103')
,   ('a@a.com', 'A', 'Z', '', '20210104')

select  STUFF(max(convert(varchar(8), date, 112) + nullif(city, '')), 1, 8, '') as city
,   STUFF(max(convert(varchar(8), date, 112) + nullif(state, '')), 1, 8, '') as state
,   STUFF(max(convert(varchar(8), date, 112) + nullif(zip, '')), 1, 8, '') as zip
,   email
,   max(date) as date
from    @t
group by email

We construct strings by appending them to date, and at the same time convert '' to NULL, this simplifies our aggregation.

Then after performing MAX aggregation, we deconstruct the strings by removing the date part, this leaves the real last value per date.

相关问题