csv pandas读取textfile作为 Dataframe

kjthegm6  于 2023-09-28  发布在  其他
关注(0)|答案(1)|浏览(91)

我有一个包含数百行的文本文件,其中每行看起来如下所示:

"LastName, FirstName MiddleName", 222555,X-150,2023,0.15,0.20,0.5,"1, 2, 10",--,1.5,5.10,report

分隔符通常是逗号,除非在引号内。上面的每一行都需要分隔到以下列中:

LastName, FirstName MiddleName
222555
X-150
2023
0.15
0.20
0.5
1,2,10
--
1.5
5.10
report
ubby3x7f

ubby3x7f1#

如果你想使用Pandas,可以尝试这样做。只使用read_csv应该仍然可以很好地处理您的数据,然后您可以将引号中的列拆分为它们自己的列。最后,删除带引号的列:

#### Making mock csv/text file
from io import StringIO
csv_file = StringIO('\n'.join(['"LastName, FirstName MiddleName", 222555,X-150,2023,0.15,0.20,0.5,"1, 2, 10",--,1.5,5.10,report']*3))
#########################

# Load in the file as a CSV
df = pd.read_csv(csv_file, header = None)

# Split the names into separate name columns
df[['Last', 'First', 'Middle']] = [x for x in df[0].str.replace(',', '').str.split(' ')]

# Split the numbers in quotes into 3 separate columns
df[['num1', 'num2', 'num3']] = [x for x in df[7].str.split(', ')]

# Remove the columns that you split
df = df.drop(df.columns[[0, 7]], axis = 1)

输出量:

相关问题