在单个查询中从DB2中的表中删除重复行

rkttyhzu  于 2023-04-30  发布在  DB2
关注(0)|答案(7)|浏览(207)

我有一个3列的表,如下所示:

one   |   two    |  three  |   name
------------------------------------
 A1       B1          C1        xyz
 A1       B1          C1        pqr      -> should be deleted
 A1       B1          C1        lmn      -> should be deleted
 A2       B2          C2        abc
 A2       B2          C2        def      -> should be deleted
 A3       B3          C3        ghi
------------------------------------

表没有任何主键列。我对表没有任何控制权,因此我不能添加任何主键列。
如图所示,我想删除一列、两列和三列的组合相同的行。所以如果A1B1C1出现三次(如上面e.(g.),另外两个应该删除,只保留一个。
如何在DB2中仅通过一个查询实现这一点?
我的要求是一个单一的查询,因为我会通过Java程序运行它。

new9mtju

new9mtju1#

(This假设您使用的是DB2 for Linux/Unix/Windows,其他平台可能略有不同)

DELETE FROM
    (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
     FROM SESSION.TEST) AS A
WHERE RN > 1;

应该能帮你找到你想要的了。
查询使用OLAP functionROWNUMBER()为每个ONETWOTHREE组合中的每一行分配一个数字。然后DB2能够将fullselect(A)引用的行匹配为DELETE statement应从表中删除的行。为了能够使用fullselect作为delete子句的目标,它必须匹配deletable view的规则(请参阅注解部分下的“可删除视图”)。
下面是一些证明(在LUW 9上测试)。7):

DECLARE GLOBAL TEMPORARY TABLE SESSION.TEST (
    one CHAR(2),
    two CHAR(2),
    three CHAR(2),
    name CHAR(3)
) ON COMMIT PRESERVE ROWS;

INSERT INTO SESSION.TEST VALUES 
    ('A1', 'B1', 'C1', 'xyz'),
    ('A1', 'B1', 'C1', 'pqr'),
    ('A1', 'B1', 'C1', 'lmn'),
    ('A2', 'B2', 'C2', 'abc'),
    ('A2', 'B2', 'C2', 'def'),
    ('A3', 'B3', 'C3', 'ghi');

DELETE FROM
    (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
     FROM SESSION.TEST) AS A
WHERE RN > 1;

SELECT * FROM SESSION.TEST;

编辑2017年3月2日:
在回答Ahmed Anwar的问题时,如果您需要捕获被删除的内容,您也可以将delete与“data change statement”组合。在这个例子中,你可以做如下的事情,这将给予你“rn”列,onetwo,和 three

SELECT * FROM OLD TABLE (
    DELETE FROM
        (SELECT 
             ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
            ,ONE
            ,TWO
            ,THREE
         FROM SESSION.TEST) AS A
    WHERE RN > 1
) OLD;
nsc4cvqm

nsc4cvqm2#

DELETE FROM the_table tt
WHERE EXISTS ( SELECT *
    FROM the_table ex
    WHERE ex.one = tt.one
    AND ex.two = tt.two
    AND ex.three = tt.three
    AND ex.zname < tt.zname -- tie-breaker...
    );

注意:您的SQL方言可能会有所不同。注2:“name”在某些平台上是保留字。最好避免它。

ldioqlga

ldioqlga3#

@a_horse_with_no_name回答db2的变体,用于不使用group by子句和in子句的iseries。它真的有用

DELETE from the_table a 
where rrn(a) < (
select max(rrn(a)) from the_table b 
where a.one = b.one and a.two = b.two and a.three = b.three
)
wbgh16ku

wbgh16ku4#

DELETE  FROM Table_Name
WHERE   Table_Name_ID NOT IN ( SELECT  MAX(Table_Name_ID)
                                    FROM    Table_Name
                                    GROUP BY one ,
                                             two, 
                                             three )

one two threee是你的重复列,Table_Name_ID是PK

55ooxyrt

55ooxyrt5#

Please take backup of table before deleting the data

Delete from table where Name in (select name from table
group by one,two,three
having count(*) > 2)

你可以用

DELETE from TABLE Group by one,two,three Having count(*) > 2;
vltsax25

vltsax256#

这是levenlevi的答案的一个变体,它不需要表上的主键(现在无法测试语法)

DELETE FROM the_table
WHERE  rid_bit(the_table) NOT IN (SELECT MAX(rid_bit(the_table))
                                  FROM the_table
                                  GROUP BY one,two,three)

我认为在iSeries上不支持rid_bit(),但rrn()保存相同的目的

sd2nnvve

sd2nnvve7#

对于其他使用非常旧版本的db2 SQL的用户:这些帖子的组合帮助识别并删除了两次发布的2个批次的dup。

SELECT   * FROM     LIBRARY.TABLE a
WHERE    a.batch in (115131, 115287)
AND      EXISTS ( SELECT 1 from LIBRARY.TABLE d 
    WHERE d.batch in (115131, 115287)
     AND a.one = d.one AND a.two = d.two AND a.three = d.three 
    GROUP BY d.one, d.two, d.three 
    HAVING count(*) <> 1 )

    AND RRN(a) > (SELECT MIN(RRN(b)) FROM LIBRARY.TABLE b 
        WHERE b.batch in (115131, 115287)
        AND a.one = b.one AND a.two = b.two AND a.three = b.three );

相关问题