用于处理文本文件和创建csv文件的Python脚本

omhiaaxx  于 2023-01-28  发布在  Python
关注(0)|答案(3)|浏览(146)

我有一个文本文件,其中包含特定格式的数据,如下所示:

Name : John Doe
age  : 30
Job  : Accountant

我需要此数据在csv格式,如:

John Doe,30,Accountant

在python中处理这些数据并得到所需结果的最佳方法是什么?

von4xj4u

von4xj4u1#

你可以试试这样的方法:

with open('somefile.txt') as f:
    lines = f.readlines()
    print(lines)

vals = []
for line in lines:
    vals.append(line.rpartition(':')[2].strip()) <-- this would basically split your string by colon

row = ",".join(vals) <-- this is the one you can use
print(row)

输出:
['姓名:无名氏\n','年龄:30\n','职务:会计师']
无名氏30岁会计
您也可以将其写入csv,如下所示:

with open('data.csv', 'w') as out:
    lines = [line for line in row.split(',')]
    my_string = ','.join(lines)
    out.write(my_string)

qyyhg6bp

qyyhg6bp2#

我假设您希望CSV文件中的每条记录后面都有一个换行符,所以这应该可以实现:

with open('file1.txt') as f_in:
    lines = (line for line in f_in)
    with open('csv_out.txt', 'w') as f_out:
        for line_num, line in enumerate(lines, start=1):
            # Obtain required data without any white space
            line = line.strip().split(':')[1][1:]
            if line_num % 3 == 0:
                # On every 3rd line add a newline; else add the required comma between the fields
                line += '\n'
            else:
                line += ','
            f_out.write(line)

我还假设你的源文件和你在问题中陈述的完全一样,但是,如果你 * 没有 * 冒号之间的空格,只需从第6行删除[1:]切片。
在第二行中使用生成器解析意味着不需要将整个源文件加载到内存中,这意味着源文件有多大并不重要(这可能与您相关,也可能无关)。
如果出于某种原因(现在或以后)需要将源代码保存在内存中,可以将第二行替换为:

lines = f_in.readlines()
xn1cxnb4

xn1cxnb43#

import pandas as pd

# Read in the text file
df = pd.read_csv("input.txt", delimiter='\t')

# Make any necessary edits to the DataFrame

# Export the DataFrame to a CSV file
df.to_csv("output.csv", index=False)

相关问题