新行从文本文件到csv中的列

5cnsuln7  于 2023-07-31  发布在  其他
关注(0)|答案(3)|浏览(131)

我是一个新的编程苏请忍受我。
我有一个.txt文件,我正在阅读它。该文件包含以下格式的元素:

*名称name_1
*错误信息some_text_1
*正确文本some_text_2
*错误类型some_text_3

这种格式一直重复。
我想把这个文件写到csv,它将有列(前面提到的元素),如下所示:
| 错误文本|更正文本|错误类型| Type of Error |
| --|--|--| ------------ |
| some_text_1|some_text_2|some_text_3| some_text_3 |

with open(filename) as infile, open('outfile.csv','w') as outfile:  
    for line in infile: 
        outfile.write(line)

字符串
这就是我能想到的,代码只是将文本写入CSV。

ipakzgxi

ipakzgxi1#

要从文本创建一个csv文件并包含列,可以使用csv.DictWriter创建列

import csv

# Set fieldnames for the CSV file
fieldnames = ['Name', 'Error Text', 'Correct Text', 'Type of Error']

# convert data to dictionaries 
rows = []

with open('yourfile.txt', 'r') as infile:
    lines = infile.readlines()
    for i in range(0, len(lines), 4):  # Process four lines at a time
        name = lines[i].split(":")[1].strip()
        error_text = lines[i + 1].split(":")[1].strip()
        correct_text = lines[i + 2].split(":")[1].strip()
        type_of_error = lines[i + 3].split(":")[1].strip()
        rows.append({'Name': name, 'Error Text': error_text, 'Correct Text': correct_text, 'Type of Error': type_of_error})

# Write CSV file
with open('yourfileoutput.csv', 'w', newline='') as outfile:
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(rows)

字符串

a9wyjsp7

a9wyjsp72#

逐行读取输入文件。在结肠处裂开。检查关键字是否按顺序排列。构建输入数据的字典。
打开一个输出文件(CSV)并转储列标题和字典内容,如下所示:

from itertools import cycle

COLUMNS = ['Name', 'Error Text', 'Correct Text', 'Type of Error']
TXTFILE = '/Volumes/G-Drive/input.txt'
CSVFILE = '/Volumes/G-Drive/output.csv'
SEP = '|'

d = {}

column = cycle(COLUMNS)

with open(TXTFILE) as indata:
    for line in indata:
        k, v = line.split(':')
        if k == next(column):
            d.setdefault(k, []).append(v.strip())
        else:
            raise ValueError(f'Unexpected keyword "{k}"')

with open(CSVFILE, 'w') as outdata:
    print(SEP.join(COLUMNS), file=outdata)
    for data in zip(*d.values()):
        print(SEP.join(data), file=outdata)

字符串

ou6hu8tu

ou6hu8tu3#

假设输入文件为:

Name: name_1
Error Text: some_text_1_1
Correct Text: some_text_1_2
Type of Error: some_text_1_3
Name: name_2
Error Text: some_text_2_1
Correct Text: some_text_2_2
Type of Error: some_text_2_3
Name: name_3
Error Text: some_text_3_1
Correct Text: some_text_3_2
Type of Error: some_text_3_3

字符串
只需使用pure pandas,read_csv': '作为分隔符,然后使用assign作为重复数据删除列,该列基于使用每个“Name”重新启动的组(在cumsum的帮助下),最后使用pivot,并导出to_csv

df = (pd.read_csv(filename, sep=r':\s+', engine='python', header=None)
        .assign(idx=lambda d: d[0].eq('Name').cumsum())
        .pivot(index='idx', columns=0, values=1)
     )
df.to_csv('outfile.csv', index=False)


输出文件:

Correct Text,Error Text,Name,Type of Error
some_text_1_2,some_text_1_1,name_1,some_text_1_3
some_text_2_2,some_text_2_1,name_2,some_text_2_3
some_text_3_2,some_text_3_1,name_3,some_text_3_3


要保持列的原始顺序,请执行以下操作:

df = (pd.read_csv(filename, sep=r':\s+', engine='python', header=None)
        .assign(idx=lambda d: d[0].eq('Name').cumsum())
        .pipe(lambda d: d.pivot(index='idx', columns=0, values=1)[d[0].unique()])
     )
df.to_csv('outfile.csv', index=False)


输出量:

Name,Error Text,Correct Text,Type of Error
name_1,some_text_1_1,some_text_1_2,some_text_1_3
name_2,some_text_2_1,some_text_2_2,some_text_2_3
name_3,some_text_3_1,some_text_3_2,some_text_3_3

相关问题