写入CSV时保留特殊字符-使用什么编码？

envsm3lx 于 2023-04-03 发布在其他

关注(0)|答案(1)|浏览(148)

我正在尝试将字符串the United Nations’ Sustainable Development Goals (SDGs)保存到csv中。

如果我使用utf-8作为编码，字符串中的撇号将转换为ASCII字符

import csv
str_ = "the United Nations’ Sustainable Development Goals (SDGs)"

#write to a csv file
with open("output.csv", 'w', newline='', encoding='utf-8') as file:
    csv_writer = csv.writer(file,delimiter=",")
    csv_writer.writerow([str_])

#read from the csv file created above
with open("output.csv",newline='') as file:
    csv_reader = csv.reader(file)

    for row in csv_reader:
        print(row)

结果是['the United Nationsâ€™ Sustainable Development Goals (SDGs)']

如果我使用cp 1252作为编码，则字符串中的撇号将被保留，如您在结果中所见

import csv
str_ = "the United Nations’ Sustainable Development Goals (SDGs)"

#write to a csv file
with open("output.csv", 'w', newline='', encoding='cp1252') as file:
    csv_writer = csv.writer(file,delimiter=",")
    csv_writer.writerow([str_])

#read from the csv file created above
with open("output.csv",newline='') as file:
    csv_reader = csv.reader(file)

    for row in csv_reader:
        print(row)

我得到的结果是['the United Nations' Sustainable Development Goals (SDGs)']，这是理想的

如果我想保留特殊字符，我应该使用什么编码？使用utf-8比使用cp 1252有什么好处？

我的用例是将CSV中的行提供给语言模型（GPT），因此我希望文本是“英语”/ Unchanged。
我在Windows 11上使用Python 3.8

csv

来源：https://stackoverflow.com/questions/75884290/preserving-special-characters-when-writing-to-a-csv-what-encoding-to-use

1条答案

按热度按时间

wqnecbli1#

with open("output.csv", 'w', newline='', encoding='utf-8') as file:
    ...

with open("output.csv",newline='') as file:
    ...

问题很简单，你显式地、正确地将UTF-8写入文件，但随后打开它以阅读一些未定义的隐式编码，在你的情况下，默认值为 not UTF-8。
阅读文件时还包括编码，一切都很好：

with open('output.csv', newline='', encoding='utf-8') as file:

你 * 应该 * 使用UTF-8作为编码，因为它可以编码所有可能的字符。大多数其他编码只能编码所有可能字符的一部分。你需要有一个很好的理由使用另一种编码。如果你有一个特定的目标（如Excel），你知道什么编码的目标喜欢，然后使用它。否则UTF-8是一个明智的默认值。

赞(0）回复(0）举报 2023-04-03

我来回答

写入CSV时保留特殊字符-使用什么编码？

1条答案

相关问题

热门标签

最新问答