regex 如何在Python中的两个不同的“无空格”模式之间使用正则表达式替换?

doinxwow  于 2023-05-08  发布在  Python
关注(0)|答案(4)|浏览(105)

我在正则表达式上遇到了点麻烦。从本质上讲,我想删除“<”和“>”之间的所有内容,但前提是“<”后面没有空格,“>”前面没有空格。例如,应删除“”,但不应删除“< a>”和“”。
到目前为止,我有这个:

cleanText = re.sub('<[^>]+>', ' ', text)

这是可行的,除了我不知道如何处理无白色规则。
此时,输入:

txt = "<this should be removed> hello < world > </this should also be removed> this should stay"

返回:

hello     this should stay

而我希望它返回:

hello < world >    this should stay

有什么建议吗?提前感谢。

vwkv1x7d

vwkv1x7d1#

你可以在开头和结尾字符之前使用否定的lookahed和lookbehing:

r'<(?!\s)([^>]*)?(?<!\s)>'
bprjcwpo

bprjcwpo2#

您可以匹配:

<[^<>\s](?:[^<>]*[^<>\s])?>

说明

  • <按字面匹配
  • [^<>\s]匹配除<>以外的单个字符或空白字符
  • (?:非捕获组
  • [^<>]*匹配0+除<>以外的字符
  • [^<>\s]匹配除<>以外的单个字符或空白字符
  • )?关闭非捕获组并将其设为可选
  • >按字面匹配

在替换中使用空字符串。
请参见regex demo

import re
 
pattern = r"<[^<>\s](?:[^<>]*[^<>\s])?>"
txt = "<this should be removed> hello < world > </this should also be removed> this should stay"
result = re.sub(pattern, "", txt)
 
if result:
    print (result)

输出

hello < world >  this should stay

如果你想要空的<>,那么你可以使用in the replacement

a8jjtwal

a8jjtwal3#

您可以将以下正则表达式的匹配项转换为空字符串。

(?<=<(?!\s))[^<>]*?(?=(?<!\s)>)

Demo
此表达式具有以下元素。

(?<=       Begin a negative lookbehind
  <        Match a literal
  (?!\s)   Use a negative lookahead to assert that the previous
           match is not followed by a whitespace character
)          End negative lookbehind
[^<>]*
?          Require previous match to match as few characters as possible
(?=        Begin a positive lookahead
  (?<!\s)  Use a negative lookbehind to assert that the following
           match is not preceded by a whitespace character
  >        Match a literal
)          End the positive lookahead
vohkndzv

vohkndzv4#

也许是这样的:/<\S[^>]*>/g

const text = '<this should be removed> hello < world > </this should also be removed> this should stay';
const result = text.replace(/<\S[^>]*>/g, '');
console.log(result);

相关问题