regex 在R中使用stringr查找最后一个子字符串[duplicate]之后的剩余字符串

dvtswwa3 于 2023-04-22 发布在其他

关注(0)|答案(3)|浏览(102)

此问题已在此处有答案：

R - replace last instance of a regex match and everything afterwards（4个答案）
两年前关闭。
如何使用str_match提取最后一个子字符串之后的剩余字符串。
例如，对于字符串“apples and oranges and bananas with cream”，我想提取该字符串在最后一次出现“and”之后的剩余部分，以返回“bananas with cream”。
我已经尝试了这个命令的许多替代方法，但它要么继续返回第一个“and”之后的字符串的剩余部分，要么返回一个空字符串。

library(stringr)

str_match("apples and oranges and bananas with cream", "(?<= and ).*(?! and )")
    
    #     [,1]                             
    #[1,] "oranges and bananas with cream"

我搜索了StackOverflow的解决方案，找到了一些针对javascript，Python和base R的解决方案，但没有找到针对stringr包的解决方案。
谢谢。

regex

来源：https://stackoverflow.com/questions/50184888/use-stringr-in-r-to-find-the-remaining-string-after-last-substring

3条答案

按热度按时间

xt0899hw1#

(Don我不知道str_match。Base R regex应该足够了。）由于regex模式匹配是“贪婪的”，即它将搜索所有匹配并选择最后一个，它只是：

sub("^.+and ", "", "apples and oranges and bananas with cream")
#[1] "bananas with cream"

我敢肯定，在哈雷宇宙的“润滑剂”角落里也会有类似的东西。
然后失败：

library(lubridate)

Attaching package: ‘lubridate’

The following object is masked from ‘package:plyr’:

    here

The following objects are masked from ‘package:data.table’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week, yday, year

The following object is masked from ‘package:base’:

    date

> str_replace("apples and oranges and bananas with cream", "^.+and ", "")
Error in str_replace("apples and oranges and bananas with cream", "^.+and ",  : 
  could not find function "str_replace"

所以它不在pkg:lubridate中，而是在stringr中（据我所知，它是stringi包的一个非常轻的 Package 器）：

library(stringr)
 str_replace("apples and oranges and bananas with cream", "^.+and ", "")
[1] "bananas with cream"

我真的希望那些问关于非基本包函数的问题的人能包括一个library调用，以给予回答者一个关于他们工作环境的线索。

赞(0）回复(0）举报 2023-04-22

wtzytmuj2#

另一种简单的方法是使用*SKIP what's to avoid模式的变体，即What_I_want_to_avoid|(What_I_want_to_match)：

library(stringr)
s  <- "apples and oranges and bananas with cream"
str_match(s, "^.+and (.*)")[,2]

这里的关键思想是完全忽略regex引擎返回的所有匹配：这是垃圾桶。相反，我们只需要检查捕获组1到[,2]，当设置时，它包含我们正在寻找的内容。另请参阅：http://www.rexegg.com/regex-best-trick.html#pseudoregex
我们可以使用基R gsub-函数来做类似的事情，例如。

gsub("^.+and (.*)", "\\1", s, perl = TRUE)

PS：不幸的是，我们不能将What_I_want_to_avoid(*SKIP)(*FAIL)|What_I_want_to_match模式与stringi/stringr函数一起使用，因为引用的ICU regex library不包括(*SKIP)(*FAIL)动词（它们只在PCRE中可用）。

赞(0）回复(0）举报 2023-04-22

m528fe3b3#

如果我们需要str_match

library(stringr)
str_match("apples and oranges and bananas with cream",   ".*\\band\\s(.*)")[,2]
#[1] "bananas with cream"

或者有一个来自stringi的stri_match_last

library(stringi)
stri_match("apples and oranges and bananas with cream", 
         regex = ".*\\band\\s(.*)")[,2]
#[1] "bananas with cream"

赞(0）回复(0）举报 2023-04-22

我来回答

regex 在R中使用stringr查找最后一个子字符串[duplicate]之后的剩余字符串

3条答案

相关问题

热门标签

最新问答