若要在postgresql中的'copy from'期间忽略重复的索引键

wlzqhblo  于 2022-12-12  发布在  PostgreSQL
关注(0)|答案(5)|浏览(308)

我必须将大量数据从文件转储到PostgreSQL表中。我知道它不支持MySql中所做的“忽略"”替换“等操作。几乎所有Web上关于此问题的帖子都建议将数据转储到临时表中,然后执行”插入...选择...不存在的地方...“。
这在一种情况下没有帮助,文件数据本身包含重复的主键。任何人都知道如何在PostgreSQL中处理这个问题吗?
附言:我是从一个java程序做这件事的,如果它有帮助的话

z6psavjg

z6psavjg1#

使用与您所描述的相同的方法,但是DELETE(或分组,或修改...)在加载到主表之前在临时表中复制PK
类似于:

CREATE TEMP TABLE tmp_table 
ON COMMIT DROP
AS
SELECT * 
FROM main_table
WITH NO DATA;

COPY tmp_table FROM 'full/file/name/here';

INSERT INTO main_table
SELECT DISTINCT ON (PK_field) *
FROM tmp_table
ORDER BY (some_fields)

详细数据:CREATE TABLE ASCOPYDISTINCT ON

qhhrdooz

qhhrdooz2#

PostgreSQL 9.5现在有了upsert功能。你可以按照Igor的说明操作,除了最后的INSERT包含子句ON CONFLICT DO NOTHING。

INSERT INTO main_table
SELECT *
FROM tmp_table
ON CONFLICT DO NOTHING
jum4pzuy

jum4pzuy3#

Igor的回答帮了我很大的忙,但我也遇到了Nate在他的评论中提到的问题。然后我遇到了一个问题--可能除了这里的问题之外--新数据不仅包含内部重复,而且还包含与现有数据的重复。对我有效的方法如下。

CREATE TEMP TABLE tmp_table AS SELECT * FROM newsletter_subscribers;
COPY tmp_table (name, email) FROM stdin DELIMITER ' ' CSV;
SELECT count(*) FROM tmp_table;  -- Just to be sure
TRUNCATE newsletter_subscribers;
INSERT INTO newsletter_subscribers
    SELECT DISTINCT ON (email) * FROM tmp_table
    ORDER BY email, subscription_status;
SELECT count(*) FROM newsletter_subscribers;  -- Paranoid again

内部和外部副本在tmp_table中变得相同,然后DISTINCT ON (email)部分删除它们。ORDER BY确保所需的行在结果集中排在第一位,然后DISTINCT丢弃所有后面的行。

7qhs6swi

7qhs6swi4#

插入到按键分组的临时表中,以便删除重复项
如果不存在则插入

2g32fytz

2g32fytz5#

用于使用COPY FROM,防止目标表和源文件中出现重复项(在本地示例中验证结果)。
这也应该在红移工作,但我还没有验证它。

-- Target table
CREATE TABLE target_table
(id integer PRIMARY KEY, firstname varchar(100), lastname varchar(100));
INSERT INTO target_table (id, firstname, lastname) VALUES (14, 'albert', 'einstein');
INSERT INTO target_table (id, firstname, lastname) VALUES (4, 'isaac', 'newton');

-- COPY FROM with protection against duplicates in the target table as well as in the source file
BEGIN;
  CREATE TEMP TABLE source_file_table ON COMMIT DROP AS (
    SELECT * FROM target_table
  )
  WITH NO DATA;

  -- Simulating COPY FROM
  INSERT INTO source_file_table (id, firstname, lastname) VALUES (14, 'albert', 'einstein');
  INSERT INTO source_file_table (id, firstname, lastname) VALUES (7, 'marie', 'curie');
  INSERT INTO source_file_table (id, firstname, lastname) VALUES (7, 'marie', 'curie');
  INSERT INTO source_file_table (id, firstname, lastname) VALUES (7, 'marie', 'curie');
  INSERT INTO source_file_table (id, firstname, lastname) VALUES (5, 'Neil deGrasse', 'Tyson');

  -- for protection agains duplicate in target_table
  UPDATE source_file_table SET id=NULL
  FROM target_table WHERE source_file_table.id=target_table.id;

  INSERT INTO target_table
  SELECT * FROM source_file_table
  -- for protection agains duplicate in target_table
  WHERE source_file_table.id IS NOT NULL
  -- for protection agains duplicate in source file
  UNION
  (SELECT * FROM source_file_table
   WHERE source_file_table.id IS NOT NULL
   LIMIT 1);
COMMIT;

相关问题