如何修复字符Map到未定义的unicode错误

bsxbgnwa 于 2021-06-20 发布在 Mysql

关注(0)|答案(1)|浏览(297)

我试图从一个数据库中创建一个csv数据，以便在数据仓库中将其移动到云中。但是，当我运行它时，它总是在36599行之后退出，并给我

UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position 62: character maps to <undefined>

我发现导致问题的字符串是“拒绝的案例号不匹配”，我认为这是撇号的问题。我不知道这是为什么造成这个问题，一直无法找到解决办法。有人知道怎么解决这个问题吗？我使用的代码是：

db = pymysql.connect(host='host', port=3306, user="user", passwd="secret", 
db="db", autocommit=True)
cur = db.cursor()

# cur.execute("call inv1_view_prod.`Email_agg`")

cur.execute("""select fields from table""") 

emails = cur.fetchall()
with open('O:\file\path\to\File_name.csv','w') as fileout:
        writer = csv.writer(fileout)
        writer.writerows(emails)   
time.sleep(1)

mysql python csv unicode

来源：https://stackoverflow.com/questions/51137970/how-to-fix-character-maps-to-undefined-unicode-error

1条答案

按热度按时间

disho6za1#

由于您没有显示导致错误的代码，我只是猜测。
唯一的事实是这个字符串 'Rejected-Case No. doesn’t match' 包含 "’" 它是unicode字符u+2019，右单引号。在WindowsCP1252代码页中，这个字符确实有代码 0x92 .
看起来您在某个地方有一个用cp1252字符集编码的字节字符串，它没有被正确地解码为unicode字符串。
应该做什么：
有解决办法。不幸的是，它们将依赖于您正在使用的python版本（2或3），在不了解任何代码的情况下，我只能给出一般性建议：
标识输入字符集（数据库为python脚本提供的内容）
标识输出字符集（要在csv模块中写入的内容）
使用显式转换可以传递正确的字符集
选择性使用 error=replace 在编码/解码调用中避免unicodeerror异常。
如果您使用python3，我将假设您在从数据库解码unicode时遇到问题。右单引号有unicode代码u+2019，但在给定给python的字符串中是编码的 '\x92' 哪个是 cp1252 字节编码。一个快速而肮脏的解决方法是强制编码/解码过程获得正确的unicode字符串。您的代码可能会变成：

db = pymysql.connect(host='host', port=3306, user="user", passwd="secret", 
db="db", autocommit=True)
cur = db.cursor()

# cur.execute("call inv1_view_prod.`Email_agg`")

cur.execute("""select fields from table""") 

charset = 'cp1252'   # or 'utf8' depending on what you want in the csv file
with open('O:\file\path\to\File_name.csv','w', encoding=charset,
           errors='replace', newline='') as fileout:
        writer = csv.writer(fileout)
        for row in cur.fetchall():
            writer.writerow([field.encode('latin1').decode('cp1252', errors='replace')
                for field in row])

这个 encode('latin1').decode('cp1252') 这只是修复python3字符串的一个技巧，其中字符具有字节编码的代码。它起作用是因为 latin1 对于256以下的所有代码，编码都是不可操作的。
这个 errors=replace 选项中，要求python从不引发unicodeerror异常，而是用 '?' 对于字节字符串或使用官方unicode替换字符u+fffd '�' 对于unicode字符串。
用这个可能会更干净 charset 选择 pymysql.connect . 不幸的是，我从未使用过python的mysql数据库。。。

赞(0）回复(0）举报 2021-06-20

我来回答

如何修复字符Map到未定义的unicode错误

1条答案

相关问题

热门标签

最新问答