postgresql Postgres使用多列查找重复项,并根据列值删除一行

2hh7jdfx  于 2023-01-02  发布在  PostgreSQL
关注(0)|答案(3)|浏览(280)

我有一个样本数据下表。

|  id  | type| cycle|    datetimestamp    |  status1  |  status2  |
|:----:|:---:|:----:|:-------------------:|:---------:|:---------:|
| 27   | AN  |  123 | 2022-12-28 04:12:31 | Normal A  | Normal A  |
| 27   | AR  |  124 | 2022-12-28 04:12:31 | Normal A  | Normal A  |<--Delete
| 19   | AN  |  125 | 2022-12-28 05:24:30 | Normal A  | Normal A  |
| 19   | AR  |  126 | 2022-12-28 06:18:20 | Normal A  | Normal A  |
| 19   | AR  |  234 | 2022-12-28 07:22:20 | Normal A  | Normal A  |<--Delete
| 19   | AN  |  235 | 2022-12-28 07:22:20 | Normal A  | Normal A  |
| 20   | AR  |  236 | 2022-12-28 08:25:49 | Normal A  | Normal A  |<--Delete
| 20   | AN  |  237 | 2022-12-28 08:25:49 | Normal A  | Normal A  |
| 19   | AR  |  129 | 2022-12-28 09:08:19 | Normal A  | Normal A  |<--Delete
| 19   | AN  |  127 | 2022-12-28 09:08:19 | Normal A  | Normal A  |
| 19   | AR  |  238 | 2022-12-28 10:04:31 | Normal A  | Normal A  |<--Delete
| 19   | AN  |  230 | 2022-12-28 10:04:31 | Normal A  | Normal A  |
| 22   | AN  |  239 | 2022-12-28 11:04:58 | Normal A  | Normal A  |
| 22   | AR  |  256 | 2022-12-28 11:04:58 | Normal A  | Normal A  |<--Delete

我想找出使用列(id,type,datestampamp,status1,status2)的重复项,并删除列类型为“AR”的重复行(保留类型为“AN”)。

|  id  | type| cycle|    datetimestamp    |  status1  |  status2  |
|:----:|:---:|:----:|:-------------------:|:---------:|:---------:|
| 27   | AN  |  123 | 2022-12-28 04:12:31 | Normal A  | Normal A  |
| 19   | AN  |  125 | 2022-12-28 05:24:30 | Normal A  | Normal A  |
| 19   | AR  |  126 | 2022-12-28 06:18:20 | Normal A  | Normal A  |
| 19   | AN  |  235 | 2022-12-28 07:22:20 | Normal A  | Normal A  |
| 20   | AN  |  237 | 2022-12-28 08:25:49 | Normal A  | Normal A  |
| 19   | AN  |  127 | 2022-12-28 09:08:19 | Normal A  | Normal A  |
| 19   | AN  |  230 | 2022-12-28 10:04:31 | Normal A  | Normal A  |
| 22   | AN  |  239 | 2022-12-28 11:04:58 | Normal A  | Normal A  |

但是我的查询返回的type='AN'而不是' AR '。

select * from test_data e
where exists
 ( select * from test_data e2 
   where e.datetimestamp=e2.datetimestamp and e.id=e2.id 
     and e.status1=e2.status1 
     and e.status2=e2.status2 
     and e.type='AN' and e2.type='AR') order by e.datetimestamp asc;

设置表查询:

~~~pgsql
CREATE TABLE test_data (
 id character varying(2) NOT NULL,
 type character varying(2),
 cycle integer,
 datetimestamp timestamp without time zone NOT NULL,
 status1 character varying(10),
 status2 character varying(10),
 PRIMARY KEY(id, cycle, datetimestamp)
);

INSERT INTO test_data VALUES
  (27, 'AN', 123, '2022-12-28 04:12:31', 'Normal A', 'Normal A')
, (27, 'AR', 124, '2022-12-28 04:12:31', 'Normal A', 'Normal A')
, (19, 'AN', 125, '2022-12-28 05:24:30', 'Normal A', 'Normal A')
, (19, 'AR', 126, '2022-12-28 06:18:20', 'Normal A', 'Normal A')
, (19, 'AR', 234, '2022-12-28 07:22:20', 'Normal A', 'Normal A')
, (19, 'AN', 235, '2022-12-28 07:22:20', 'Normal A', 'Normal A')
, (20, 'AR', 236, '2022-12-28 08:25:49', 'Normal A', 'Normal A')
, (20, 'AN', 237, '2022-12-28 08:25:49', 'Normal A', 'Normal A')
, (19, 'AR', 129, '2022-12-28 09:08:19', 'Normal A', 'Normal A')
, (19, 'AN', 127, '2022-12-28 09:08:19', 'Normal A', 'Normal A')
, (19, 'AR', 238, '2022-12-28 10:04:31', 'Normal A', 'Normal A')
, (19, 'AN', 230, '2022-12-28 10:04:31', 'Normal A', 'Normal A')
, (22, 'AN', 239, '2022-12-28 11:04:58', 'Normal A', 'Normal A')
, (22, 'AR', 256, '2022-12-28 11:04:58', 'Normal A', 'Normal A')
;
~~~
r6vfmomb

r6vfmomb1#

您可以通过以下查询列出删除的行:

ALTER TABLE test_data ADD COLUMN test_data_id serial;

SELECT t.test_data_id 
FROM test_data t
JOIN (
        SELECT array_agg(test_data_id) as arr from test_data GROUP BY id, datetimestamp, status1, status2 HAVING COUNT(*) > 1) as dup
    ON t.test_data_id = ANY(dup.arr)
WHERE t.type = 'AR'

因此,删除查询将为:

DELETE FROM test_data 
WHERE test_data_id IN (
        select t.test_data_id 
        from test_data t
        JOIN (
                select array_agg(test_data_id) as arr from test_data GROUP BY id, datetimestamp, status1, status2 HAVING COUNT(*) > 1
            ) as dup ON t.test_data_id = ANY(dup.arr)
        WHERE t.type = 'AR'
)

稍微解释一下:我的解决方案是创建一个新的唯一列并将其作为主键,然后过滤重复的行并删除类型为"AR"的行。
p/s:如果你不允许添加1列,你可以使用ctid列代替,但我不参考这个解决方案。
希望对你有帮助。

watbbzwu

watbbzwu2#

试试这个:

DELETE FROM test_data a
USING test_data b
WHERE a.id = b.id
  AND a.type = 'AN'
  AND b.type = 'AR'
  AND a.datetimestamp = b.datetimestamp
  AND a.status1 = b.status2
  AND a.status2 = b.status2

dbfiddle中的测试结果

mnemlml8

mnemlml83#

exists看起来是一个不错的方法,从你的查询开始,我只是在type上固定了条件(我们想删除“AR”记录,基于“AN”的存在);我还将多重等式重写为元组等式--但这只是语法上的糖衣:

delete 
from test_data e1
where type = 'AR' and exists (
    select 1 
    from test_data e2
    where (e2.datetimestamp, e2.id, e2.status1, e2.status2, e2.type) 
        = (e1.datetimestamp, e1.id, e1.status1, e1.status2, 'AN')
)

相关问题