Query to remove duplicates in SQL Server

7fyelxc5  于 2023-04-19  发布在  SQL Server
关注(0)|答案(2)|浏览(134)

I need a single query without using CTE / row_number to remove duplicates based on the ACTID in SQL Server.

Table :

ACTID TRID CD DATE
-------------------
 1    1     0 12/04 - To be deleted 
 1    2     2 11/04 - To be deleted 
 1    3     4 10/04 - Do not delete 
 2    4     0 10/04 - To be deleted 
 2    5     2 11/04 - Do not delete 
 3    6     4 12/04 - Do not delete 
 3    7     4 11/04 - To be deleted

Conditions:

  • Do not delete if CD = 4
  • If CD in 0, 2 or 4, 4, keep the newest record
  • If CD in 2, 4, delete 2

The output should consist of TRID's that needs to be deleted.

Able to get the required outcome using cte but unable to integrate it in Java jpa so looking for simpler approach without using cte/row_number.

hkmswyz6

hkmswyz61#

Though the query seems messy it will do the trick.

create table myTable(ACTID int, TRID int, CD int, DATE varchar(5))
insert into myTable values( 1,    1,     0, '12/04');-- - To be deleted 
 insert into myTable values(1,    2,     2, '11/04');-- - To be deleted 
 insert into myTable values(1,    3,     4, '10/04');-- - Do not delete 
 insert into myTable values(2,    4,     0, '10/04');-- - To be deleted 
 insert into myTable values(2,    5,     2, '11/04');-- - Do not delete 
 insert into myTable values(3,    6,     4, '12/04');-- - Do not delete 
 insert into myTable values(3,    7,     4, '11/04');-- - To be deleted

Delete Query:

delete a from myTable a inner join myTable b 
  on (a.ACTID=b.ACTID and b.CD=4 and a.CD<4)
  or (a.ACTID=b.ACTID and b.CD=4 and a.TRID<>b.TRID and (a.CD=4 and b.CD=4) and a.DATE<b.DATE)
  or (a.ACTID=b.ACTID and b.CD=4 and a.TRID<>b.TRID  and((a.CD = 0 and b.CD =2)or(a.CD = 2 and b.CD =0)) and a.DATE<b.DATE)
  or (a.ACTID=b.ACTID and a.TRID<>b.TRID  and(a.CD in (0,2) and b.CD in (0,2)) and a.DATE<b.DATE)
    and not exists (select * from mytable m where m.ACTID=a.ACTID and m.CD=4)

select * from myTable

ACTIDTRIDCDDATE
13410/04
25211/04
36412/04

fiddle

py49o6xq

py49o6xq2#

Here's my try:

CREATE TABLE #t(ACTID int, TRID int, CD int, DATE varchar(5))
INSERT INTO #t VALUES(1, 1, 0, '12/04')-- - To be deleted 
,(1, 2, 2, '11/04')-- - To be deleted 
,(1, 3, 4, '10/04')-- - Do not delete 
,(2, 4, 0, '10/04')-- - To be deleted 
,(2, 5, 2, '11/04')-- - Do not delete 
,(3, 6, 4, '12/04')-- - Do not delete 
,(3, 7, 4, '11/04');-- - To be deleted 
 
SELECT  *
FROM    #t t
WHERE   EXISTS(
        SELECT  1
        FROM    #t t2
        WHERE   t2.ACTID = t.actid
        AND (
            (t.cd <> 4 AND t2.cd = 4)
        OR  (t.cd = 4 AND t2.cd = 4 AND t2.DATE > t.date)
        OR  (t.cd <> 4 AND t2.cd <> 4 AND t2.DATE > t.date)
        )
    )

It returns list of ids to remove by filtering out:

  1. Those with CD <> 4 where exists a CD=4 row
  2. Where exists another CD=4 row with later date
  3. Those non CD=4 rows with lower dates than other non CD=4

相关问题