regex 使用正则表达式查找单词的单数或复数形式

wwwo4jvm  于 2023-01-14  发布在  其他
关注(0)|答案(2)|浏览(139)

假设我有这样一个句子:

sentence = "A cow runs on the grass"

如果我想用“some”特殊标记替换单词cow,我可以这样做:

to_replace = "cow"
# A <SPECIAL> runs on the grass
sentence = re.sub(rf"(?!\B\w)({re.escape(to_replace)})(?<!\w\B)", "<SPECIAL>", sentence, count=1)

另外,如果我想替换它的复数形式,我可以这样做:

sentence = "The cows run on the grass"
to_replace = "cow"
# Influenza is one of the respiratory <SPECIAL>
sentence = re.sub(rf"(?!\B\w)({re.escape(to_replace) + 's?'})(?<!\w\B)", "<SPECIAL>", sentence, count=1)

即使要替换的单词保持其单数形式cows?也进行替换,而s?进行执行替换的工作。
我的问题是,如果我想以一种更普遍的方式应用相同的功能,即查找和替换单词,这些单词可以是单数,复数-以s结尾,也可以是复数-以es结尾(请注意,我有意忽略了许多可能出现的边缘情况-在问题的注解中讨论)。另一种构建问题的方法是如何给一个单词添加多个可选的结尾后缀,以便它适用于以下示例:

to_replace = "cow"
sentence1 = "The cow runs on the grass"
sentence2 = "The cows run on the grass"
# --------------
to_replace = "gas"
sentence3 = "There are many natural gases"
bfnvny8b

bfnvny8b1#

我建议使用常规的python逻辑,如果不需要的话,记住要避免过度拉伸正则表达式:

phrase = "There are many cows in the field cowes"
for word in phrase.split():
    if word == "cow" or word == "cow" + "s" or word == "cow" + "es":
        phrase = phrase.replace(word, "replacement")
print(phrase)

输出:

There are many replacement in the field replacement
rvpgvaaj

rvpgvaaj2#

显然,对于我发布的用例,我可以使用suffix optional。因此,它可以如下所示:

re.sub(rf"(?!\B\w)({re.escape(e_obj) + '(s|es)?'})(?<!\w\B)", "<SPECIAL>", sentence, count=1)

请注意,这对评论中讨论的许多边缘情况不起作用!

相关问题