如何使用R抽取模式的所有匹配项并合并不同的匹配项?

rjee0c15  于 2023-01-28  发布在  其他
关注(0)|答案(2)|浏览(122)

我希望使用regex模式从字符串中提取所有匹配项,然后仅将distinct匹配项组合到单个字符串中。
我想提取单词films之前的所有单词,然后只合并distinct单词。我尝试使用以下脚本,它组合了所有匹配项:

text1 <- "Netflix announced 34 new Korean films to hit the streaming platform in 2023, along with 12 Japanese films. The upcoming titles, which Netflix calls their “biggest-ever lineup of Korean films and series."

pattern <- "\\b[[:alpha:]]+\\b(?=\\sfilms)"

map_chr(str_extract_all(text1, pattern), paste, collapse = " | ")

> 'Korean | Japanese | Korean'

预期输出:

'Korean | Japanese'
5rgfhyps

5rgfhyps1#

试试这个

text1 <- "Netflix announced 34 new Korean films to hit the streaming platform in 2023, along with 12 Japanese films. The upcoming titles, which Netflix calls their “biggest-ever lineup of Korean films and series."

pattern <- "\\b[[:alpha:]]+\\b(?=\\sfilms)"

paste(unique((str_extract_all(text1, pattern)[[1]])), collapse = " | ")

我们得到

"Korean | Japanese"
fruv7luv

fruv7luv2#

请按照下面的代码取消列出,然后考虑独特的元素

text1 <- "Netflix announced 34 new Korean films to hit the streaming platform in 2023, along with 12 Japanese films. The upcoming titles, which Netflix calls their “biggest-ever lineup of Korean films and series."

pattern <- "\\b[[:alpha:]]+\\b(?=\\sfilms)"

map_chr(unique(unlist(str_extract_all(text1, pattern))), paste, collapse = " | ")

相关问题