从一个csv文件读取并根据内容写入多个不同的csv文件

zpjtge22  于 2023-01-15  发布在  其他
关注(0)|答案(3)|浏览(240)

我正在读入一个很大的csv文件。根据column1中字符串的前两个值,我希望将内容输出到不同的csv文件中。
文件可能如下所示:

Column1;Column2
01;BE
02;ED
12;FD
14;DS
03;ED
04;DF

我的代码如下:

import csv
output_path=r'C:\myfolder\large_file.csv'

with open(os.path.join(os.path.dirname(output_path),"column1_01.csv"), "w", encoding="utf-8", newline='') as \
out_01, open(os.path.join(os.path.dirname(output_path),"column1_02.csv"), "w", encoding="utf-8", newline='') as \
out_02, open(os.path.join(os.path.dirname(output_path),"column1_03.csv"), "w", encoding="utf-8", newline='') as \
out_03, open(os.path.join(os.path.dirname(output_path),"column1_04.csv"), "w", encoding="utf-8", newline='') as \
out_04:
    
    cw01 = csv.writer(out_01, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    cw02 = csv.writer(out_02, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    cw03 = csv.writer(out_03, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    cw04 = csv.writer(out_04, delimiter=";", quoting=csv.QUOTE_MINIMAL)

    with open(output_path, encoding="utf-8") as in_f:
        cr = csv.reader(in_f, delimiter=";")
        cw01.writerow(next(cr))
        cw02.writerow(next(cr))
        cw03.writerow(next(cr))
        cw04.writerow(next(cr))

        for line in cr:
            if (line[0][:2] =="01"): cw01.writerow(line)  
            if (line[0][:2] =="02"): cw02.writerow(line)  
            if (line[0][:2] =="03"): cw03.writerow(line)  
            if (line[0][:2] =="04"): cw04.writerow(line)

我现在的问题是,当我添加下一行,"05"的输出时,我得到了一个错误stopiteration(我用样本数据检查了它,其中没有太多的数据行,所以我猜它一定与它有关)。此外,主要问题是,当检查文件时,我可以看到它不工作。它没有正确地将记录输出到正确的文件中。我的错误在哪里?
我想问题可能是所有的下一个( chrome )。我想做的是只是使用头从原来的大csv和有它的每一个csv。头是一样的。我需要所有的文件在年底与头。
我需要一个纯csv的解决方案。没有其他软件包。
我想在我的代码中找到准确的错误。为什么这不起作用?错误在哪里?我不想做任何进一步的修改,引入异常处理,如try,函数,如def或任何其他。我不需要一个通用的解决方案。我想在我的代码中找到具体的错误。
原来的大文件有很多列,所以标题相当长。因此,我想要一个解决方案,我不必手动键入所有列将其添加到标题。

kiayqfof

kiayqfof1#

Python内置函数next通过调用__next__()方法从迭代器中获取下一个元素,这样,迭代器就会向前移动到下一个元素,所以你需要多次调用next来获取头部。

[...]
    with open(output_path, 'r', encoding="utf-8") as in_f:
        cr = csv.reader(in_f, delimiter=";")
        header = next(cr)
        cw01.writerow(header)
        cw02.writerow(header)
        cw03.writerow(header)
        cw04.writerow(header)

        for line in cr:
[...]

希望能有所帮助。

tjvv9vkg

tjvv9vkg2#

当你这样做的时候会发生什么呢?我把你的代码打乱了,使它成为通用的,这样你就可以给予任何cols的列表,它会为每个col生成一个输出文件,并在csv编写器的字典中查找col。

import csv
output_path = r'C:\myfolder\large_file.csv'

def get_fhandle(col_name):
  return open(
    os.path.join(os.path.dirname(output_path), f"column1_{col_name}.csv"),
    "w", encoding="utf-8", newline=''
  )

cols = ["01", "02", "03", "04", "05"]
files = {col: get_fhandle(col) for col in cols}
try:
  writers = {
    col: csv.writer(file, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    for col, file in files.items()
  }
  with open(output_path, encoding="utf-8") as in_f:
    cr = csv.reader(in_f, delimiter=";")
    header = next(cr)
    for writer in writers.values():
      writer.writerow(header)
    for line in cr:
      col = line[0][:2]
      if col not in writers:
        raise ValueError(
          f"'{col}' is not a known column. Should be: {cols.join(', ')}"
        )
      writers[col].writerow(line)
finally:
  for file in files.values():
    file.close()
igsr9ssn

igsr9ssn3#

使用contextlib.ExitStack()可以使事情得到实质性的清理,特别是如果我们建立一个小字典来Map输入中的哪些键Map到我们想要写入的输出文件。

import contextlib
import csv

files_out = {
    "01": "column1_01.csv",
    "02": "column1_02.csv",
    "03": "column1_03.csv",
    "04": "column1_04.csv",
}

with contextlib.ExitStack() as stack:
    ## ---------------------
    ## Set up a bunch of csv writers
    ## ---------------------
    writers = {
        key: csv.writer(stack.enter_context(open(value, "w", newline="")))
        for key, value
        in files_out.items()
    }
    ## ---------------------

    file_in = stack.enter_context(open("small.csv"))
    rows = csv.reader(file_in, delimiter=";")

    ## ---------------------
    ## write a header to each output file
    ## ---------------------
    headers = next(rows)
    for writer in writers.values():
        writer.writerow(headers)
    ## ---------------------

    ## ---------------------
    ## write rows to the appropriate file
    ## ---------------------
    for row in rows:
        key = row[0]
        if not key in writers:
            print(f"no file defined for key {key}")
            continue
        writers[key].writerow(row)
    ## ---------------------

要回答您的代码当前无法正常工作的具体问题,让我们来看一下:

with open(output_path, encoding="utf-8") as in_f:
    cr = csv.reader(in_f, delimiter=";")
    cw01.writerow(next(cr))
    cw02.writerow(next(cr))
    cw03.writerow(next(cr))
    cw04.writerow(next(cr))

    for line in cr:
        if (line[0][:2] =="01"): cw01.writerow(line)  
        if (line[0][:2] =="02"): cw02.writerow(line)  
        if (line[0][:2] =="03"): cw03.writerow(line)  
        if (line[0][:2] =="04"): cw04.writerow(line)

此处:

cw01.writerow(next(cr))

将输入文件的第一行(头)写到第一个输出文件中。然后我们重复这个过程,将连续的输入行写到输出文件中。
在将输入的前4行写入到四个输出中的每一个输出中的一行之后,我们读取输入文件中的其余行(示例中的最后两行),并根据需要将它们写入文件3和4
因此,您的具体解决方案是使用next()一次从输入文件中获取头文件,然后将其写入每个输出文件:

with open(output_path, encoding="utf-8") as in_f:
    cr = csv.reader(in_f, delimiter=";")
    headers = next(cr)

    cw01.writerow(headers)
    cw02.writerow(headers)
    cw03.writerow(headers)
    cw04.writerow(headers)

    for line in cr:
        if (line[0][:2] =="01"): cw01.writerow(line)  
        if (line[0][:2] =="02"): cw02.writerow(line)  
        if (line[0][:2] =="03"): cw03.writerow(line)  
        if (line[0][:2] =="04"): cw04.writerow(line)

相关问题