library(stringr)
x <- "uncertainty negatively influences economic agents investment and business decisions which leads to decrease in demand. When the economic environment is fraught with uncertainty and the future is unclear businesses and firms may hold back their decisions until uncertainty subsides. Ever since the start of the pandemic global economic outlook has been unclear with unprecedented uncertainty leading to fall in demand."
regex <- "(uncertainty|unclear)\\s(\\w+\\s){1,30}(global|decrease in demand|fall in demand)"
str_count(x, regex)
# [1] 2
str_extract_all(x, regex)
# [[1]]
# [1] "uncertainty negatively influences economic agents investment and business decisions which leads to decrease in demand"
# [2] "unclear with unprecedented uncertainty leading to fall in demand"
1条答案
按热度按时间ioekq8ef1#
我不确定你的文字中的错别字是否是故意的(“不确定性”而不是“不确定性”),所以我纠正了它,但尝试这样做:
字符串
|
)unclear时开始匹配\\s
+
)a字字符\\w
(A-Z,a-z,_)和空格\\s
。此模式应在{1,30}
的1到30倍之间匹配从技术上讲,所有捕获组都可以通过
?:
设置为非捕获组,因为您不需要反向引用或专门捕获它们。在您发布的文本中,您在最后一句话中有一个有趣的案例,“自疫情开始以来,全球经济前景一直不明朗,前所未有的不确定性导致需求下降。
根据你的理解,这实际上可能有两个匹配:
1.由于前所未有的不确定性导致需求下降
1.不确定性导致需求下降
如果这是你的解释,那么你发布的文本应该有三个,而不是两个匹配。
只是说明一下:
“不确定性消退。自疫情开始以来,全球经济前景一直不明朗,前所未有的不确定性导致需求下降。