csv Python在无效字符上停止

aiqt4smr  于 2023-06-19  发布在  Python
关注(0)|答案(1)|浏览(82)

我正在阅读一个csv文件,并使用下面的代码将其作为JSON写入MongoDB

bucket_name = "my-bucket"
    file_name = "input.csv"
    csv_data = pd.DataFrame(pd.read_csv('gs://' + bucket_name + '/' + file_name, encoding='utf-8'))
    csv_data = csv_data.to_dict(orient="records")
    db.task.insert_many(csv_data)

下面是输入csv文件

ID,Name,Identity,Alignment,EyeColor,HairColor,Gender,Status,Appearances,FirstAppearance,Year,Universe
100001,Claude Potier (Earth-616),Secret,Neutral,Hazel,Brown,Male,Living,2,àå÷-00,2000,DC
100002,Elektra Natchios (Earth-616),Secret,Neutral,Blue,Black,Female,Living,280,éðå-81,1981,Marvel
100003,Thomas Williams (Earth-616),Secret,Neutral,Black,,Male,Living,1,àåâ-02,2002,DC
100004,Mogul (Earth-616),,,,Bald,,Living,,îàé-70,1970,DC

它有一些不是Unicode的字符。有没有办法
1.我可以用类似“invalid”的字符替换这些无效字符,并继续处理并存储在mongo数据库中。
1.有没有办法把它放在数据库里?
我得到的错误

File "pandas/_libs/parsers.pyx", line 1499, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 0: invalid continuation byte
gorkyyrv

gorkyyrv1#

尝试添加'encoding_errors='ignore '':

csv_data = pd.DataFrame(pd.read_csv('gs://' + bucket_name + '/' + file_name, encoding='utf-8', encoding_errors='ignore'))

相关问题