我想从下面的句子中抽取一个短语模式。
text1 <- "On a year-on-year basis, the number of subscribers of Netflix increased 1.15% in November last year."
text2 <- "There is no confirmed audited number of subscribers in the Netflix's earnings report."
text3 <- "Netflix's unaudited number of subscribers has grown more than 1.50% at the last quarter."
模式为number of subscribers
或audited number of subscribers
或unaudited number of subscribers
。
我使用了前面一个问题中的模式\\bnumber\\s+of\\s+subscribers?\\b
(感谢@wiktor-stribizew),然后提取短语。
find_words <- function(text){
pattern <- "\\bnumber\\s+of\\s+subscribers?\\b" # something like this
str_extract(text, pattern)
}
然而,这提取了精确的number of subscriber
,而不是其他模式。
预期输出:
查找单词(文本1)
'订阅者数量'
查找单词(文本2)
'审计的订阅者数量'
查找单词(文本3)
'未审核的订阅者数'
1条答案
按热度按时间50pmv0ei1#
看看这个行不行
您可以使用您提供的示例文本进行测试: