我是一个新手,我在处理.txt文件,其中有一个字典。我试图pd.read_csv
和创建一个 Dataframe 在Pandas。我得到了一个错误Error tokenizing data. C error: Expected 4 fields in line 2, saw 11
抛出。我相信我找到了根本问题,这是文件很难阅读,因为每一行包含一个字典,其键-值对是由逗号分隔,在这种情况下是分隔符。
数据(store.txt)
id,name,storeid,report
11,JohnSmith,3221-123-555,{"Source":"online","FileFormat":0,"Isonline":true,"comment":"NAN","itemtrack":"110", "info": {"haircolor":"black", "age":53}, "itemsboughtid":[],"stolenitem":[{"item":"candy","code":1},{"item":"candy","code":1}]}
35,BillyDan,3221-123-555,{"Source":"letter","FileFormat":0,"Isonline":false,"comment":"this is the best store, hands down and i will surely be back...","itemtrack":"110", "info": {"haircolor":"black", "age":21},"itemsboughtid":[1,42,465,5],"stolenitem":[{"item":"shoe","code":2}]}
64,NickWalker,3221-123-555, {"Source":"letter","FileFormat":0,"Isonline":false, "comment":"we need this area to be fixed, so much stuff is everywhere and i do not like this one bit at all, never again...","itemtrack":"110", "info": {"haircolor":"red", "age":22},"itemsboughtid":[1,2],"stolenitem":[{"item":"sweater","code":11},{"item":"mask","code":221},{"item":"jack,jill","code":001}]}
我该如何读取这个csv文件并根据键值创建新的列呢?另外,如果其他数据中有更多的键值呢?例如字典中有11个以上的键。
有没有一种有效的方法可以从上面的例子中创建一个df?
尝试读取为csv时的代码##
df = pd.read_csv('store.txt', header=None)
我试图导入json和用户一个转换器,但它不工作,并转换了所有的逗号到一个|′
import json
df = pd.read_csv('store.txt', converters={'report': json.loads}, header=0, sep="|")
此外我还试着用途:`
import pandas as pd
import json
df=pd.read_csv('store.txt', converters={'report':json.loads}, header=0, quotechar="'")
我也在想在字典的开头和结尾加一个引号,使它成为一个字符串,但认为这太乏味了,找不到右括号。
1条答案
按热度按时间sqxo8psd1#
我认为在字典周围加上引号是正确的方法。你可以使用regex来做这件事,并且使用不同于
"
的引号字符(我在我的例子中使用了§
):注意:csv中的最后一个值不是有效的json:
"code":001
。它应该是"code":"001"
或"code":1
输出量: