regex python字符串中未替换汉字的特殊字符

xqnpmsa8  于 2023-05-19  发布在  Python
关注(0)|答案(1)|浏览(132)

I cannot seem to substitute a ')' or a '(' without causing errors in other strings. ')' and '(' are special characters. Here are two strings "sample(志信达).mbox" and "sample#宋安兴.mbox" . If I use re to substitute the characters,the chinese character suffers a substitution too. Here is the code in python:

# -*- coding: utf-8 -*-
import re
source1='sample(志信达).mbox'
source2='sample#宋安兴.mbox'
newname1=re.sub(r'[\(\);)(]','-',source1)
newname2=re.sub(r'[\(\);)(]','-',source2)
print source1,newname1
print source2,newname2

结果如下:

sample(志信达).mbox sample---志信达---.mbox
sample#宋安兴.mbox sample#宋?-兴.mbox

请注意,其中一个字符被替换为'?-'

llycmphe

llycmphe1#

你应该使用unicode文字(参见https://docs.python.org/2/howto/unicode.html#unicode-literals-in-python-source-code):

# -*- coding: utf-8 -*-
import re
source1 = u'sample(志信达).mbox'
source2 = u'sample#宋安兴.mbox'
newname1 = re.sub(ur'[\(\);)(]','-',source1)
newname2 = re.sub(ur'[\(\);)(]','-',source2)
print source1,newname1
print source2,newname2

结果:

sample(志信达).mbox sample-志信达-.mbox
sample#宋安兴.mbox sample#宋安兴.mbox

另外,不要忘记将.py文件保存为UTF-8(IDE可能会自动执行此操作,也可能需要手动更改编码,具体取决于您使用的文本编辑器)。

相关问题