SQL Server Keep newest value by duplicate dates

thigvfpy  于 2023-11-16  发布在  其他
关注(0)|答案(3)|浏览(140)

I have a table called 'staff':

Account   Score   UpdateTime                UpdateTime_order               
K1897     A       2023-09-08 14:57:58.113   1
K1897     B       2023-09-08 14:57:57.896   2
K1897     B       2023-08-01 10:07:57.487   3
K1897     B       2023-06-28 07:23:57.696   4
K1897     B       2023-06-05 14:20:13.789   5
K1898     C       2023-06-04 14:20:13.789   1

Every staff can only have one score per day, so Account K1897's score should be A at 2023-09-08. (The score changed from B to A that day)

To solve this problem, I decided to convert datetime to date format, then kept the newest UpdateTime when they are duplicate.

For example, I deselected the row when UpdateTime_order=2 from Account K1897 because its original UpdateTime 2023-09-08 14:57:57.896 < 2023-09-08 14:57:58.113

Account   Score   UpdateTime   UpdateTime_order               
K1897     A       2023-09-08   1      
K1897     B       2023-08-01   3
K1897     B       2023-06-28   4
K1897     B       2023-06-05   5
K1898     C       2023-06-04   1

Then reordered the UpdateTime_order based on new result.

What I expected:

Account   Score   UpdateTime   UpdateTime_order             
K1897     A       2023-09-08   1
K1897     B       2023-08-01   2
K1897     B       2023-06-28   3
K1897     B       2023-06-05   4
K1898     C       2023-06-04   1

My code:

;WITH CTE_staff AS (
select 
Account, 
Score, 
CAST([UpdateTime] AS Date) UpdateDate,
UpdateTime, 
UpdateTime_order
FROM staff
)
select 
Account, 
Score, 
UpdateDate,
ROW_NUMBER()OVER(PARTITION BY Account ORDER BY UpdateTime DESC) as UpdateTime_order3
from(
select *,
ROW_NUMBER()OVER(PARTITION BY Account, UpdateDate ORDER BY UpdateTime DESC) as UpdateTime_order2 
from CTE_staff
) jj
where jj.UpdateTime_order2=1

It ran succesfully, but I think I wrote it in a complicated way by creating new columns. Wandering if there's an easy way to do this?

fiddle: https://dbfiddle.uk/dJ1qw3Lt

lb3vh1jj

lb3vh1jj1#

Fiddle: https://dbfiddle.uk/_NZfe5JW

WITH RankedStaff AS (
    SELECT
        Account,
        Score,
        UpdateTime,
        ROW_NUMBER() OVER (PARTITION BY Account, CONVERT(DATE, UpdateTime) ORDER BY UpdateTime DESC) AS RowNum
    FROM
        staff
)
SELECT
    Account,
    Score,
    UpdateTime
FROM
    RankedStaff
WHERE
    RowNum = 1;

This query uses a common table expression (CTE) named RankedStaff to assign a row number to each row within a partition defined by the combination of Account and the date part of UpdateTime . The ROW_NUMBER() function is used to order the rows within each partition based on UpdateTime in descending order, so the row with the newest UpdateTime gets the row number 1.

The final SELECT statement then filters the results to only include rows where RowNum is equal to 1, meaning it's the row with the newest UpdateTime for each day for each account.

This should give you a result where only the rows with the latest UpdateTime for each day for each account are included.

xfb7svmp

xfb7svmp2#

Here's how to accomplish it using one ROW_NUMBER() to get the most recently updated row per account and day :

SELECT Account, Score, UpdateTime, UpdateTime_order
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY Account, CAST(UpdateTime AS Date) ORDER BY UpdateTime DESC) AS RN
  FROM staff
) AS S
WHERE rn = 1
ORDER BY UpdateTime DESC

It is also possible to use GROUP BY and the aggregate function MAX() : We need first determine the maximum update time per account and day:

SELECT Account, MAX(UpdateTime) AS MAX_UpdateTime
FROM staff
GROUP BY Account, CAST(UpdateTime AS DATE)

Then Join this dataset with the table as follow :

SELECT s.*
FROM staff s
INNER JOIN (
  SELECT Account, MAX(UpdateTime) AS MAX_UpdateTime
  FROM staff
  GROUP BY Account, CAST(UpdateTime AS DATE)
) AS t ON S.Account = t.Account AND s.UpdateTime = t.MAX_UpdateTime
ORDER BY s.UpdateTime DESC

Demo here

qc6wkl3g

qc6wkl3g3#

This should return result you said you expect:

with cte_staff as
  (select Account, Score, CAST([UpdateTime] AS Date) UpdateDate, UpdateTime_order,
      row_number() over (partition by Account, cast([UpdateTime] as Date) order by updateTime desc) rn
   from staff
  )
select Account, Score, UpdateDate, 
  row_number() over (partition by Account order by Score, rn) UpdateTime_order
from cte_staff
where rn = 1
order by Account, UpdateTime_order; 

Account     Score   UpdateDate  UpdateTime_order
K1897       A       2023-09-08  1
K1897       B       2023-06-05  2
K1897       B       2023-06-28  3
K1897       B       2023-08-01  4
K1898       C       2023-06-04  1

See the fiddle .

相关问题