我在处理CSV文件时遇到问题,像这样的格式:
输入.csv文件:
1,abc,65.0,en-GB,"reverted,Knowledge Alert,ab00998978,1,Y,Y,default,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,zolhOgdmpwAQjfaUONdTD7,15.0,en-GB,"New & Dropped Routes,Article,KM100015050,2,N,Y,default,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,zolhOgdmpwAQjfaUONdTD7,4.0,en-GB,"New & Dropped Routes,Article,KM100015050,3,N,Y,default,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
元数据.csv文件
tablename,silver_tablename,fileformat,fileformat_historical,silver_overwrite,preprocessing,cdc,bronzeload,silverload,softdelete,harddelete,harddeleteonlykeys,deletecolumn,deletevalue,executionset,readeroptions,autoloaderoptions,autoloaderoptions_historical,enabled,encoding_format,bronze_overwrite,sensitive_data,data_protection
agent,agent,csv,csv,N,Y,N,Y,Y,Y,N,N,OPERATION,D,set1,"{'header': 'true', 'sep': 'chr(1)', 'quoting':'csv.QUOTE_NONE','readerCaseSensitive': 'false'}",,,Y,UTF-8,N,N,N
但是在运行我的代码后,双引号后的所有数据都变成了单列。
我的代码:
import pandas as pd
entity_df = pd.read_csv(entity_control_source_path, header=0, sep=",", quotechar='"', dtype=str)
1条答案
按热度按时间mhd8tkvw1#
你可以通过将
quoting=3
传递给pd.read_csv()
来让pandas完全忽略引号:参见official documentation。
或者,您可以事先手动清理数据,并像定期那样读取数据。