我在将数据写入正确格式化的CSV时遇到了麻烦。数据是动漫电视节目,包括某些数据,如标题,流派和梗概。在解析来自API调用的数据后,除了流派和梗概外,它都正确写入CSV。如果动漫有几个流派,只有第一个似乎被 Package 在““中,其余的部分与大纲结合起来,从大纲开始。
解析函数:
def parse_data(data):
try:
# Genre parse
genres_list = data['genres']
genres = ', '.join(genre['name'] for genre in genres_list)
print(genres)
# Studio parse
studio_name = "unknown"
studio_parse = str(data.get('studios'))
match = re.search(r"'name':\s*'([^']*)'", studio_parse)
if match:
studio_name = match.group(1)
else:
None
# Synopsis parse
synopsis_dirty = data['synopsis']
synopsis = re.sub(r"\(Source: [^\)]+\)", "", synopsis_dirty).strip()
synopsis = re.sub(r'\[Written by MAL Rewrite\]', '', synopsis).strip()
details = str(data['id']) + ',' + data['title'].encode('utf-8').decode('cp1252', 'replace') + ',' + data['start_date'] + ',' + str(data['mean']) + ',' + str(data['rank']) + ',' + str(data['popularity']) + ',' + str(data['num_episodes']) + ',' + data['rating'] + ',' + studio_name + ',' + genres + ',' + synopsis.encode('utf-8').decode('cp1252', 'replace')
split_data = re.split(r"[,]", details)
return split_data
字符串
词典格式:
def parsed_data_to_dict(data):
try:
dict = {
"id" : data[0],
"title" : data[1].encode('utf-8').decode('cp1252', 'replace'),
"start-date" : data[2],
"mean" : data[3],
"rank" : data[4],
"popularity" : data[5],
"num_episodes" : data[6],
"rating" : data[7],
"studio" : data[8],
"genres" : data[9],
"synopsis" : ''.join(data[10:]).encode('utf-8').decode('cp1251', 'replace')
}
Expected CSV:
"8","Bouken Ou Beet","2004-09-30","6.93","4426","5274","52","pg","Toei Animation","Adventure, Fantasy, Shounen, Supernatural", "It is the dark century and the people are suffering under the rule of the devil Vandel who is able to manipulate monsters."
Actual CSV:
"8","Bouken Ou Beet","2004-09-30","6.93","4426","5274","52","pg","Toei Animation","Adventure"," Fantasy Shounen SupernaturalIt is the dark century and the people are suffering under the rule of the devil Vandel who is able to manipulate monsters."
2条答案
按热度按时间tuwxkamq1#
您可以手动创建逗号分隔的字符串以将其用作CSV原始文件。这种方法容易出错,特别是当字符串中包含逗号时。您始终需要小心地组合合并”“和”“字符串。
我建议你使用csv standard library代替。
af7jpaap2#
这
字符串
在这一行之前,你已经有了
data
,它是字典,因此适合与csv.DictWriter
(标准库的一部分)一起使用,它允许字符串包含,
,也将处理包含"
的字符串,考虑下面的例子型
将给予文件
names.csv
以下内容型