regex 如何在R中提取字符串中的不同模式？

hgb9j2n6 于 2023-01-18 发布在其他

关注(0)|答案(1)|浏览(113)

我想从下面的句子中抽取一个短语模式。

text1 <- "On a year-on-year basis, the number of subscribers of Netflix increased 1.15% in November last year."

text2 <- "There is no confirmed audited number of subscribers in the Netflix's earnings report."

text3 <- "Netflix's unaudited number of subscribers has grown more than 1.50% at the last quarter."

模式为number of subscribers或audited number of subscribers或unaudited number of subscribers。
我使用了前面一个问题中的模式\\bnumber\\s+of\\s+subscribers?\\b（感谢@wiktor-stribizew），然后提取短语。

find_words <- function(text){
  
  pattern <- "\\bnumber\\s+of\\s+subscribers?\\b" # something like this

  str_extract(text, pattern)

}

然而，这提取了精确的number of subscriber，而不是其他模式。
预期输出：
查找单词（文本1）
'订阅者数量'
查找单词（文本2）
'审计的订阅者数量'
查找单词（文本3）
'未审核的订阅者数'

regex

来源：https://stackoverflow.com/questions/75148919/how-to-extract-different-patterns-in-string-in-r

1条答案

按热度按时间

50pmv0ei1#

看看这个行不行

find_words <- function(text){

pattern <- "(audited |unaudited )?number\\s+of\\s+subscribers"

str_extract(text, pattern)

}

您可以使用您提供的示例文本进行测试：

find_words(text1)
# 'number of subscribers'
find_words(text2)
# 'audited number of subscribers'
find_words(text3)
# 'unaudited number of subscribers'

赞(0）回复(0）举报 2023-01-18

我来回答

regex 如何在R中提取字符串中的不同模式？

1条答案

相关问题

热门标签

最新问答