regex 正则表达式:删除字母之间不是撇号的所有特殊字符

ohfgkhjo  于 2022-11-18  发布在  其他
关注(0)|答案(2)|浏览(151)

我有这样一串:

s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."

我尝试使用re.sub来替换所有的特殊字符,这些字符不是字母之间的撇号,而是一个空格,所以“gluten-free”变成了gluten free,而i m将保持原样。
我试过这个:

import re

s = re.sub('[^[a-z]+\'?[a-z]+]', ' ', s)

我想说的是,用0或1个撇号替换任何不遵循一个或多个字母模式的内容,然后用白色跟随一个或多个字母。
这将返回相同的字符串:

i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread.

我希望:

i'm sorry  sir  but this is a  gluten free  restaurant  we don't serve bread
xwbd5t1u

xwbd5t1u1#

您可以将此正则表达式与嵌套的lookahead+lookbehind一起使用:

>>> s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
>>> print ( re.sub(r"(?!(?<=[a-z])'[a-z])[^\w\s]", ' ', s, flags=re.I) )
i'm sorry  sir  but this is a  gluten free  restaurant  we don't serve bread

RegEx Demo

RegEx详细数据:

  • (?!:开始负向前查找
  • (?<=[a-z]):肯定的lookbehindAssert在先前的位置有一个字母表
  • ':匹配撇号
  • [a-z]:匹配字母[a-z]
  • ):结束负向前查找
  • [^\w\s]:匹配非空格和单词字符的字符
z2acfund

z2acfund2#

您可以使用

import re
s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
print( re.sub(r"(?:(?!\b['‘’]\b)[\W_])+", ' ', s).strip() )
# => i'm sorry sir but this is a gluten free restaurant we don't serve bread

请参阅Python demoregex demo

  • 详细数据 *:
  • (?:-非捕获组的开始:
  • (?!\b['‘’]\b)-如果单词字符中有撇号,则匹配失败的负前瞻
  • [\W_]-非字或_字符
  • )+-出现一次或多次

相关问题