regex 正则表达式获取模式后的所有目标单词

ui7jx7zq  于 2023-02-17  发布在  其他
关注(0)|答案(2)|浏览(72)

我需要一个正则表达式来获取交替出现的所有目标单词(在本例中为AMIDO或TALCO),该交替出现在以REVESTI开头的单词之后,该单词之后可能或不可能有目标单词之外的其他单词,然后通过gsub执行替换:

st1 <- "LUVA TALCO AMIDO"
st2 <- "LUVA REVESTIDAS AMIDO TALCO LUBRIFIC"
st3 <- "LUVA REVESTIMENTO COM TALCO AMIDO "
list_strings <- list(st1, st2, st3)

lapply(list_strings, function(x) gsub("REVEST\\w+ .*?(AMIDO|TALCO)", "rev \\1;", x, perl = T))

[[1]]
[1] "LUVA TALCO AMIDO"               # CORRECT, because REVESTIXXX is not present    

[[2]]
[1] "LUVA rev AMIDO; TALCO LUBRIFIC" # WRONG, expected "LUVA rev AMIDO; rev TALCO;" 

[[3]]
[1] "LUVA rev TALCO; AMIDO"          # WRONG, expected "LUVA rev TALCO; rev AMIDO;"

在这个link中可以找到这个正则表达式。
有人能帮我一下吗?问候。

> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8  LC_CTYPE=Portuguese_Brazil.utf8    LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C                      
[5] LC_TIME=Portuguese_Brazil.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.2.1  tools_4.2.1     rstudioapi_0.14
oymdgrw7

oymdgrw71#

也许这能帮上忙

lapply(list_strings, function(x) 
  gsub("(REVEST\\w+)\\s+.*?\\b(AMIDO|TALCO)\\b\\s+\\b(TALCO|AMIDO)\\b\\s+.*", 
    "rev \\2; rev \\3", x))
  • 输出
[[1]]
[1] "LUVA TALCO AMIDO"

[[2]]
[1] "LUVA rev AMIDO; rev TALCO"

[[3]]
[1] "LUVA rev TALCO; rev AMIDO"
5t7ly7z5

5t7ly7z52#

下面的正则表达式就可以做到这一点:

REVEST(?:\w* ?)+?((?:((AMIDO|TALCO)| )+ ?)+)

然而,如果在两个目标单词之间有一个非目标单词,那么第二个目标单词就不会被捕获(见下面链接中的第四行),很快就会被捕获
https://regex101.com/r/I2Pwu1/3

相关问题