python 如何检索regex sub删除的文本？

gxwragnw 于 2023-02-11 发布在 Python

关注(0)|答案(3)|浏览(147)

我在Python中有一个正则表达式，它应该删除所有出现的单词“NOTE.”和下面的句子，我怎样才能正确地做到这一点，并返回所有被删除的句子呢？

import re
text = "NOTE. This is the subsequent sentence to be removed. The weather is good. NOTE. This is another subsequent sentence to be removed. The sky is blue. Note that it's a dummy text."
clean_text = re.sub("NOTE\..*?(?=\.)", "", text)

预期成果：

清除文本：

The weather is good. The sky is blue. Note that it's a dummy text.

删除的唯一语句：

["This is the subsequent sentence to be removed.", "This is another subsequent sentence to be removed."]

python

来源：https://stackoverflow.com/questions/75413603/how-to-retrieve-the-text-removed-by-regex-sub

3条答案

按热度按时间

v1uwarro1#

窃取The fourth bird's正则表达式，但使用re.split，所以我们只需要搜索一次。它返回一个列表，在不匹配和匹配部分之间交替。连接前者以获得文本，后者是您的删除。

import re
 
pattern = r"\bNOTE\.\s*([^.]*\.)\s*"
text = "NOTE. This is the subsequent sentence to be removed. The weather is good. NOTE. This is another subsequent sentence to be removed. The sky is blue. Note that it's a dummy text."
 
parts = re.split(pattern, text)
 
clean_text = ''.join(parts[::2])
print(clean_text)
 
unique_sentences_removed = parts[1::2]
print(unique_sentences_removed)

输出：

The weather is good. The sky is blue. Note that it's a dummy text.
['This is the subsequent sentence to be removed.', 'This is another subsequent sentence to be removed.']

Demo

赞(0）回复(0）举报 2023-02-11

lmyy7pcs2#

删除NOTE部分的一个选择是使用一个模式，该模式也匹配下一行后面的点，后跟可选的空格字符，而不是只Assert点。
如果将捕获组添加到模式中，则可以使用具有相同模式的re. findall返回捕获组值。
模式匹配：

\bNOTE\.\s*匹配单词NOTE，后跟.和可选的空白字符
([^.]*\.)捕获组1，匹配.以外的可选字符，然后匹配.
\s*匹配可选空白字符

请参阅此regex101 demo和一个Python demo中的匹配项和捕获组值。

import re
 
pattern = r"\bNOTE\.\s*([^.]*\.)\s*"
text = "NOTE. This is the subsequent sentence to be removed. The weather is good. NOTE. This is another subsequent sentence to be removed. The sky is blue. Note that it's a dummy text."
clean_text = re.sub(pattern, "", text)
print(clean_text)
 
unique_sentences_removed = re.findall(pattern, text)
print(unique_sentences_removed)

产出

The weather is good. The sky is blue. Note that it's a dummy text.
['This is the subsequent sentence to be removed.', 'This is another subsequent sentence to be removed.']

赞(0）回复(0）举报 2023-02-11

zbq4xfa03#

您可以使用替换函数一次性捕获删除的句子，该函数的副作用是保存删除的句子：

import re

def clean(text):
    removed = []
    def repl(m):
        removed.append(m.group(1))
        return ''
    clean_text = re.sub("NOTE\.\s*(.*?\.)\s*", repl, text)
    return clean_text, removed

text = "NOTE. This is the subsequent sentence to be removed. The weather is good. NOTE. This is another subsequent sentence to be removed. The sky is blue. Note that it's a dummy text."
result, removed = clean(text)
print(result)
print(removed)

输出：

The weather is good. The sky is blue. Note that it's a dummy text.
['This is the subsequent sentence to be removed.', 'This is another subsequent sentence to be removed.']

赞(0）回复(0）举报 2023-02-11

我来回答

python 如何检索regex sub删除的文本？

3条答案

相关问题

热门标签

最新问答