regex 使用NEAR正则表达式搜索文本

ycggw6v2 于 2023-04-13 发布在其他

关注(0)|答案(1)|浏览(163)

我有一个包含文本的向量，分解，如下所示：

words =  c("Lorem Ipsum is simply dummy text of the", "printing and typesetting industry. Lorem Ipsum has been the industrys 
            standard dummy text ever since the 1500s", "when an unknown printer took a galley of type and scrambled it to 
            make a type specimen book.", "It has survived not only five ,centuries, but also the leap into electronic")

我使用下面的正则表达式来查找单词“dummy”和“text”在6个单词内出现的位置：

grep("\b(?:dummy\\W+(?:\\w+\\W+){1,6}?text|text\\W+(?:\\w+\\W+){1,6}?dummy)\b", words)

然而，尽管第一个索引中存在“虚拟文本”，但它仍返回0。
你知道我错在哪里吗？

regex

来源：https://stackoverflow.com/questions/65237757/searching-text-with-near-regex

1条答案

按热度按时间

pu3pd22g1#

"\b"中的\b匹配一个退格符，您需要对\b、\\b进行双转义，使其匹配一个单词边界。
修正拼写错误后，您需要注意 * 限制量词 *。{1,6}?是一个 lazy 量词，它匹配修改后的子模式的一到六个匹配项（尽可能少，但仍然是找到有效匹配所需的数量），这意味着在dummy和text之间必须有 * 至少一个 * 单词。
所以，你需要用

pattern <- "\\b(?:dummy\\W+(?:\\w+\\W+){0,6}text|text\\W+(?:\\w+\\W+){0,6}dummy)\\b"

参见regex demo。

详情

\b-字边界
(?:-非捕获组的开始
dummy-一个dummy字
\W+-一个或多个非单词字符
(?:\w+\W+){0,6}-一个或多个单词字符后跟一个或多个非单词字符的零到六次出现
text-一个text字
|-或
text-一个text字
\W+-一个或多个非单词字符
(?:\w+\W+){0,6}-一个或多个单词字符后跟一个或多个非单词字符的零到六次出现
dummy-一个dummy字
)-非捕获组的结束
\b-字边界

赞(0）回复(0）举报 2023-04-13

我来回答

regex 使用NEAR正则表达式搜索文本

1条答案

相关问题

热门标签

最新问答