I am trying to get the latest of a set of columns by a PersonID out of a set of YearIDs.
If I have a table like this:
| DataID | PersonID | YearID | Data1A | Data1B | Data2A | Data2B |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | 888 | A100 | d | 0.00 | a | 1.00 |
| 2 | 888 | A101 | NULL | NULL | b | 2.00 |
| 3 | 888 | A102 | c | 3.00 | NULL | NULL |
| 4 | 333 | A100 | a | 3.40 | e | 4.00 |
| 5 | 333 | A101 | d | 0.00 | NULL | NULL |
| 6 | 333 | A102 | NULL | NULL | NULL | NULL |
How do I get the latest of column sets Data1A, Data1B and Data2A, Data2B sorted by YearID per PersonID? This is given that Data1A and Data1B are related and Data2A and Data2B are related and can not be separated, and most recent year is A102. DataID is just an incremental PK column.
My resulting table should look like this, with Year being removed as it's no longer necessary. It should ignore NULLs but not 0's:
| DataID | PersonID | Data1A | Data1B | Data2A | Data2B |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | 888 | c | 3.00 | b | 2.00 |
| 2 | 333 | d | 0.00 | e | 4.00 |
This is what I have so far, but I don't know how to take into account the fact that I want the 'max'/latest of a set of Years by PersonID. Right now it gets the max of each column but I want the most recent valid data by latest year, and it also has Data1 and Data2 not being related at all but I need them to be.
SELECT DISTINCT
T1.SID,
GroupedT1.Data1,
GroupedT1.Data2,
FROM #Table1 T1
INNER JOIN
(SELECT SID,
MAX(Data1) AS Data1,
MAX(Data2) AS Data2,
FROM #Table1
GROUP BY PersonID) GroupedT1
ON T1.PersonID = GroupedT1.PersonID
Editing thanks to Gordon for the previous answer, this is how I tried to fix my new problem:
With this solution I'm trying to get the latest for Data1 and Data2, ignoring as many NULL columns as there is, and picking data from any YearID as long as it's the latest. So if in the year A102, Data1A is NULL then it should pick year A101's Data1A, and if Data2A is null for many years, it should pick the latest (in this case, year A100). At the moment it's close but it only picks by row, and needs to pick by year and with any number of NULL data.
select t1.PersonID, t1.Data1A, t1.Data1B, t1.Data2A, t1.Data2B
from (select t1.*,
row_number() over (partition by SID order by
(case when Data1A is not null then 1 else 2 end),
(case when Data2A is not null then 1 else 2 end),
YearID desc) as seqnum
from #Table1 t1
) t1
where seqnum = 1
2条答案
按热度按时间tkclm6bt1#
This answers the original question.
I think you want a simple filtering before applying logic such as
row_number()
:EDIT:
To answer the revised question, you need to handle each columns separately. You can do this using
outer apply
:5rgfhyps2#
You can use the not exists keywork