我需要一个正则表达式来获取交替出现的所有目标单词(在本例中为AMIDO或TALCO),该交替出现在以REVESTI开头的单词之后,该单词之后可能或不可能有目标单词之外的其他单词,然后通过gsub
执行替换:
st1 <- "LUVA TALCO AMIDO"
st2 <- "LUVA REVESTIDAS AMIDO TALCO LUBRIFIC"
st3 <- "LUVA REVESTIMENTO COM TALCO AMIDO "
list_strings <- list(st1, st2, st3)
lapply(list_strings, function(x) gsub("REVEST\\w+ .*?(AMIDO|TALCO)", "rev \\1;", x, perl = T))
[[1]]
[1] "LUVA TALCO AMIDO" # CORRECT, because REVESTIXXX is not present
[[2]]
[1] "LUVA rev AMIDO; TALCO LUBRIFIC" # WRONG, expected "LUVA rev AMIDO; rev TALCO;"
[[3]]
[1] "LUVA rev TALCO; AMIDO" # WRONG, expected "LUVA rev TALCO; rev AMIDO;"
在这个link中可以找到这个正则表达式。
有人能帮我一下吗?问候。
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8 LC_CTYPE=Portuguese_Brazil.utf8 LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.2.1 tools_4.2.1 rstudioapi_0.14
2条答案
按热度按时间oymdgrw71#
也许这能帮上忙
5t7ly7z52#
下面的正则表达式就可以做到这一点:
然而,如果在两个目标单词之间有一个非目标单词,那么第二个目标单词就不会被捕获(见下面链接中的第四行),很快就会被捕获
https://regex101.com/r/I2Pwu1/3