合并csv文件时如何保留数据行?

62o28rlo  于 2023-04-03  发布在  其他
关注(0)|答案(1)|浏览(114)

我正在将一些csv文件合并到一个大数据库中。不同的文件都有相似但不同的列-即,与其他文件相比,一些文件多了一列-我设法创建了一个输出的csv文件,其中包含所有列名。然而,在这样做之后,没有数据出现。下面是我的代码:

inputs = ['data_file1.csv', 'data_file2.csv', 'data_file3.csv',
      'data_file4.csv', 'data_file5.csv', 'data_file6.csv',
      'data_file7.csv', 'data_file8.csv', 'data_file9.csv', 
      'data_file10.csv', 'data_file11.csv', 'data_file12.csv',
      'data_file13.csv', 'data_file14.csv', 'data_file15.csv',
      'data_file16.csv']  

fieldnames = []

for filename in inputs:
    with open(filename, "r", newline="") as f_in:
        reader = csv.reader(f_in)
        headers = next(reader)
        for h in headers:
          if h not in fieldnames:
            fieldnames.append(h)

with open("out.csv", "w", newline="") as f_out:
    writer = csv.DictWriter(f_out, fieldnames=fieldnames)
    writer.writeheader() #this is the addition.       
    for filename in inputs:
        with open(filename, "r", newline="") as f_in:
            reader = csv.DictReader(f_in)  # Uses the field names in this file
    for line in reader:
        writer.writerow(line)

我尝试了上面的代码,没有数据出现,正如你在这个截图中看到的:x1c 0d1x
有没有人知道如何解决这个问题?我也尝试过为每个csv文件创建一个单独的dataframe,然后合并它们,但是这创建了一个列加倍的数据库,并且再次没有行。任何帮助都非常感谢!

68de4m5k

68de4m5k1#

数据不会显示,因为您读取了以下行:for line in reader:在文件阅读循环之外:with open(filename, "r", newline="") as f_in:。下面是应该工作的代码。

inputs = ['data_file1.csv', 'data_file2.csv', 'data_file3.csv',
  'data_file4.csv', 'data_file5.csv', 'data_file6.csv',
  'data_file7.csv', 'data_file8.csv', 'data_file9.csv', 
  'data_file10.csv', 'data_file11.csv', 'data_file12.csv',
  'data_file13.csv', 'data_file14.csv', 'data_file15.csv',
  'data_file16.csv']  

fieldnames = []

for filename in inputs:
    with open(filename, "r", newline="") as f_in:
        reader = csv.reader(f_in)
        headers = next(reader)
        for h in headers:
            if h not in fieldnames:
                fieldnames.append(h)

with open("out.csv", "w", newline="") as f_out:
    writer = csv.DictWriter(f_out, fieldnames=fieldnames)
    writer.writeheader() #this is the addition.       
    for filename in inputs:
        with open(filename, "r", newline="") as f_in:
            reader = csv.DictReader(f_in)  # Uses the field names in this file
            for line in reader:
                writer.writerow(line)

相关问题