从csv插入新数据到现有的postgresql数据库

hof1towb  于 2023-07-31  发布在  PostgreSQL
关注(0)|答案(1)|浏览(99)

我的这部分代码用于循环遍历一个文件夹并导入多个csv文件到postgresql数据库:

import psycopg2

file_names = [
    'C:/john/database_files'
]

con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")

for file_name in file_names:
    with open(file_name, 'r') as file_in:
        next(file_in)
        with con.cursor() as cur:
            cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
        con.commit()

con.close()

字符串
如果有相同列标题的新csv文件稍后创建,我想只从新csv文件导入数据,以便新数据被添加到数据库中的相同数据表(不覆盖)。例如,如果创建了一个10行的新csv文件,而我当前的数据库有100行,则更新后的数据表将有110行。
非常感谢!

kjthegm6

kjthegm61#

我已经更新了代码,以创建一个包含所有索引文件列表的文本文件。这将允许代码只读取看不见的csv文件。

import os
import psycopg2

dir_name = 'C:/john/database_files'
processed_files = []

# Load processed_files from a file, if it exists
try:
    with open('processed_files.txt', 'r') as f:
        processed_files = f.read().splitlines()
except FileNotFoundError:
    pass

con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")

# Loop over all CSV files in the directory
for file_name in os.listdir(dir_name):
    if file_name.endswith('.csv') and file_name not in processed_files:
        with open(os.path.join(dir_name, file_name), 'r') as file_in:
            next(file_in)
            with con.cursor() as cur:
                cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
            con.commit()

        # Mark this file as processed
        processed_files.append(file_name)

con.close()

# Save the list of processed files
with open('processed_files.txt', 'w') as f:
    for file_name in processed_files:
        f.write("%s\n" % file_name)

字符串

相关问题