csv 如何解析下面的行,以便在Python中将第三列保存为列表

mzaanser  于 2023-07-31  发布在  Python
关注(0)|答案(3)|浏览(118)

如何使用panda或CSV类型模块解析此行

col1, col2, col3 <br>
name, date, ["data"] <br>
name, date, ["data", "data2", "data3"]  <br>
name, date, ["data1", "data2"] <br>

字符串
这是该文件的格式。
如果我使用

pd.read_csv(file)


我得到这个错误

pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 5

cotxawn7

cotxawn71#

尝试使用delimiter=', (?![^\[]*[\]])'忽略方括号之间的逗号,

import io
data = '''col1, col2, col3 <br>
name, date, ["data"] <br>
name, date, ["data", "data2", "data3"]  <br>
name, date, ["data1", "data2"] <br>'''

df = pd.read_csv(io.StringIO(data),delimiter=', (?![^\[]*[\]])', engine="python")
print(df)

字符串

输出:

col1   col2                          col3 <br>
0  name   date                      ["data"] <br>
1  name   date   ["data", "data2", "data3"]  <br>
2  name   date            ["data1", "data2"] <br>


要删除<br>

# To remove <br> tags from each line
df.rename(columns={'col3 <br>':'col3'}, inplace=True)
df['col3'] = df['col3'].apply(lambda x : x.replace(' <br>', '').strip())

>>> output
   col1  col2                        col3
0  name  date                    ["data"]
1  name  date  ["data", "data2", "data3"]
2  name  date          ["data1", "data2"]

kuuvgm7e

kuuvgm7e2#

由于第三列包含字符串格式的数据,请考虑使用StringIOconverters参数,该参数将字符串表示转换为实际的列表。

import pandas as pd
from io import StringIO
import ast

# Your data 
data = ...

# Coverting data into string representation
data_file = StringIO(data)

# Converter function to convert the string representation of lists to actual lists
def parse_list(s):
    return ast.literal_eval(s)

df = pd.read_csv(data_file, converters={'col3': parse_list})
print(df)

字符串

beq87vna

beq87vna3#

另一种可能的解决方案:

from io import StringIO

df = pd.read_csv(StringIO(text), sep=r', (?!\")|\s+\<br\>',
             engine='python').dropna(axis=1)

字符串
输出量:

col1  col2                        col3
0  name  date                    ["data"]
1  name  date  ["data", "data2", "data3"]
2  name  date          ["data1", "data2"]

相关问题