regex Python正则表达式，删除unicode字符串中除连字符以外的所有标点符号

kulphzqa 于 2023-06-30 发布在 Python

关注(0)|答案(4)|浏览(96)

我有这样一段代码来删除正则表达式字符串中的所有标点符号：

import regex as re    
re.sub(ur"\p{P}+", "", txt)

我该如何将其更改为允许连字符？如果你能解释一下你是怎么做到的，那就太好了。我明白了，如果我错了请纠正我，P后面的任何东西都是标点符号。

regex

来源：https://stackoverflow.com/questions/21209024/python-regex-remove-all-punctuation-except-hyphen-for-unicode-string

4条答案

按热度按时间

ih99xse11#

[^\P{P}-]+

\P是\p的补语-不是标点符号。因此，这匹配任何 not（不是标点符号或破折号）-导致所有标点符号，除了破折号。
示例：http://www.rubular.com/r/JsdNM3nFJ3
如果你想要一个非卷积的方式，一个替代方案是\p{P}(?<!-)：匹配所有标点符号，然后检查它不是破折号（使用负向后查找）。
工作示例：http://www.rubular.com/r/5G62iSYTdk

赞(0）回复(0）举报 2023-06-30

ki0zmccv2#

下面是使用re模块的方法，以防您必须坚持使用标准库：

# works in python 2 and 3
import re
import string

remove = string.punctuation
remove = remove.replace("-", "") # don't remove hyphens
pattern = r"[{}]".format(remove) # create the pattern

txt = ")*^%{}[]thi's - is - @@#!a !%%!!%- test."
re.sub(pattern, "", txt) 
# >>> 'this - is - a - test'

如果性能很重要，您可能希望使用str.translate，因为it's faster than using a regex。在Python 3中，代码是txt.translate({ord(char): None for char in remove})。

赞(0）回复(0）举报 2023-06-30

yyyllmsg3#

您可以指定要手动删除的标点符号，如[._,]，或者提供一个函数来代替替换字符串：

re.sub(r"\p{P}", lambda m: "-" if m.group(0) == "-" else "", text)

赞(0）回复(0）举报 2023-06-30

pinkon5k4#

你可以试试

import re, string

text = ")*^%{}[]thi's - is - @@#!a !%%!!%- test."

exclusion_pattern = r"([{}])".format(string.punctuation.replace("-", ""))

result = re.sub(exclusion_pattern, r"", text)

print(result)

“这是一个测试”

赞(0）回复(0）举报 2023-06-30

我来回答

regex Python正则表达式，删除unicode字符串中除连字符以外的所有标点符号

4条答案

相关问题

热门标签

最新问答