pandas python- CSV解析问题-忽略大括号内的分隔符

cnh2zyt3  于 2023-02-17  发布在  Python
关注(0)|答案(1)|浏览(179)

解析以下CSV文件行时遇到问题:

UPDATED,464,**"{\"node-id\":\"\",\"change-type\":\"UPDATED\",\"object-type\":\"service\",\"internalgeneratedepoch\":1674472915591000,\"topic-name\":\"Service\",\"object-id\":\"wdm_tpdr_service1\",\"changed-attributes\":{\"lifecycle-state\":{\"old-value\":\" \",\"new-value\":\"planned\"},\"administrative-state\":{\"old-value\":\" \",\"new-value\":\"outOfService\"}},\"internaleventid\":464}"**,1674472915591000,,wdm_tpdr_service1,service

问题是列3数据(粗体突出显示),它在花括号和双引号内有逗号。我无法将此列数据作为单个数据点读取,Pandas正在将此数据拆分为逗号,这些逗号被读取为分隔符。有人能帮助吗?
要将以下字符串作为单个数据点读取:

"{"node-id":"","change-type":"UPDATED","object-type":"service","internalgeneratedepoch":1674472915591000,"topic-name":"Service","object-id":"wdm_tpdr_service1","changed-attributes":{"lifecycle-state":{"old-value":" ","new-value":"planned"},"administrative-state":{"old-value":" ","new-value":"outOfService"}},"internaleventid":464}"

尝试此代码:

csv_input = pd.read_csv(file_name, delimiter=',(?![^{]*})',engine="python",index_col=False)

但是它不是对所有的行都有效。任何帮助都将不胜感激。

pes8fvy9

pes8fvy91#

您提供的代码不起作用,因为它包含无效的正则表达式作为分隔符,这是不允许的。正则表达式无效,因为它正在查找右大括号,而该右大括号可能不在逗号分隔文件的某些行中。要解决此问题,您可以删除regex表达式并使用一个简单的逗号作为分隔符,或者您可以在分隔符参数中的字符串中查找更具体的模式,例如某组字符或单词。
您可以尝试使用json库来解析第三列中的字符串:

import json

csv_input = pd.read_csv(file_name)

# read the third column in the csv
third_column = csv_input[2]

# parse the string as json
parsed_data = json.loads(third_column)

# use the parsed json data however you want

# If you want to store the parsed data in the csv, you can create a new column and add the results there.

csv_input['parsed_data'] = [json.loads(x) for x in third_column]

相关问题