regex 正则表达式：删除字母之间不是撇号的所有特殊字符

ohfgkhjo 于 2022-11-18 发布在其他

关注(0)|答案(2)|浏览(151)

我有这样一串：

s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."

我尝试使用re.sub来替换所有的特殊字符，这些字符不是字母之间的撇号，而是一个空格，所以“gluten-free”变成了gluten free，而i m将保持原样。
我试过这个：

import re

s = re.sub('[^[a-z]+\'?[a-z]+]', ' ', s)

我想说的是，用0或1个撇号替换任何不遵循一个或多个字母模式的内容，然后用白色跟随一个或多个字母。
这将返回相同的字符串：

i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread.

我希望：

i'm sorry  sir  but this is a  gluten free  restaurant  we don't serve bread

regex

来源：https://stackoverflow.com/questions/64091221/regex-remove-all-special-characters-that-are-not-apostrophes-between-letters

2条答案

按热度按时间

xwbd5t1u1#

您可以将此正则表达式与嵌套的lookahead+lookbehind一起使用：

>>> s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
>>> print ( re.sub(r"(?!(?<=[a-z])'[a-z])[^\w\s]", ' ', s, flags=re.I) )
i'm sorry  sir  but this is a  gluten free  restaurant  we don't serve bread

RegEx Demo

RegEx详细数据：

(?!：开始负向前查找
(?<=[a-z])：肯定的lookbehindAssert在先前的位置有一个字母表
'：匹配撇号
[a-z]：匹配字母[a-z]
)：结束负向前查找
[^\w\s]：匹配非空格和单词字符的字符

赞(0）回复(0）举报 2022-11-18

z2acfund2#

您可以使用

import re
s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
print( re.sub(r"(?:(?!\b['‘’]\b)[\W_])+", ' ', s).strip() )
# => i'm sorry sir but this is a gluten free restaurant we don't serve bread

请参阅Python demo和regex demo。

详细数据 *：
(?:-非捕获组的开始：
(?!\b['‘’]\b)-如果单词字符中有撇号，则匹配失败的负前瞻
[\W_]-非字或_字符
)+-出现一次或多次

赞(0）回复(0）举报 2022-11-18

我来回答

regex 正则表达式：删除字母之间不是撇号的所有特殊字符

2条答案

相关问题

热门标签

最新问答