R语言 替换另一栏中给出的单词后的所有内容

iklwldmw  于 2023-03-27  发布在  其他
关注(0)|答案(4)|浏览(125)

我有一个 Dataframe ,看起来像这样:
| 字符串|字|
| - ------|- ------|
| 美味红苹果1号|苹果|
| 美味的红苹果和香蕉|苹果|
| 美味的香蕉、苹果和桃子|苹果|
| 美味香蕉和桃子|香蕉|
| 美味的桃子和苹果|桃子|
我想删除Word列中给定单词后面的所有单词,并保留该单词。
| 字符串|字|之后|
| - ------|- ------|- ------|
| 美味红苹果1号|苹果|美味的红苹果|
| 美味的红苹果和香蕉|苹果|美味的红苹果|
| 美味的香蕉、苹果和桃子|苹果|美味的香蕉苹果|
| 美味香蕉和桃子|香蕉|美味香蕉|
| 美味的桃子和苹果|桃子|鲜桃|
有人知道怎么做吗?

string <- с("tasty red apple number 1", "tasty red apple and banana", "tasty banana and apple and peach", "tasty banana and peach", "tasty peach and apple")
word <- c("apple", "apple", "apple", "banana", "peach")
pbpqsu0x

pbpqsu0x1#

我们可以捕获字符((...))直到'Word'作为一个组,然后在replacementstr_replace)中使用捕获组的反向引用(\\1)。.*表示我们丢弃的其余字符。str_replace也被向量化以进行替换,因此我们不需要任何循环

library(dplyr)
library(stringr)
df1 %>%
   mutate(After = str_replace(String, sprintf("(.*%s).*", Word), "\\1"))
  • 输出
String   Word                  After
1         tasty red apple number 1  apple        tasty red apple
2       tasty red apple and banana  apple        tasty red apple
3 tasty banana and apple and peach  apple tasty banana and apple
4           tasty banana and peach banana           tasty banana
5            tasty peach and apple  peach            tasty peach

数据

df1 <- structure(list(String = c("tasty red apple number 1",
 "tasty red apple and banana", 
"tasty banana and apple and peach", "tasty banana and peach", 
"tasty peach and apple"), Word = c("apple", "apple", "apple", 
"banana", "peach")), class = "data.frame", row.names = c(NA, 
-5L))
2w2cym1i

2w2cym1i2#

mapply中对gsub使用lookbehind以删除字符串中不需要的部分。

transform(dat, After=mapply(\(x, y) gsub(sprintf('(?<=%s).*',  x), '', y, perl=TRUE), Word, String))
#                             String   Word                  After
# 1         tasty red apple number 1  apple        tasty red apple
# 2       tasty red apple and banana  apple        tasty red apple
# 3 tasty banana and apple and peach  apple tasty banana and apple
# 4           tasty banana and peach banana           tasty banana
# 5            tasty peach and apple  peach            tasty peach
  • 数据:*
dat <- structure(list(String = c("tasty red apple number 1", "tasty red apple and banana", 
"tasty banana and apple and peach", "tasty banana and peach", 
"tasty peach and apple"), Word = c("apple", "apple", "apple", 
"banana", "peach")), class = "data.frame", row.names = c(NA, 
-5L))
ttvkxqim

ttvkxqim3#

试试这个:

df1 %>%
  mutate(After = str_replace(String, str_c("(.*\\b", Word, "\\b).*"), "\\1"))
                            String   Word                  After
1         tasty red apple number 1  apple        tasty red apple
2       tasty red apple and banana  apple        tasty red apple
3 tasty banana and apple and peach  apple tasty banana and apple
4           tasty banana and peach banana           tasty banana
5            tasty peach and apple  peach            tasty peach

在这里我们(i)将Word Package 到字边界\\b中以防止包含Word值的较大字(例如,“dapple”和“apple”)被匹配。(ii)我们将该子串用括号括起来,以将其强制到捕获组中,然后我们(iii)在str_replace替换自变量中引用,而捕获组(.*)之后的任何内容都被省略。

ohfgkhjo

ohfgkhjo4#

您也可以使用str_extract而不是str_replace来获得稍微简单的语法:

df1 |>
  mutate(After = str_extract(String, str_c(".*\\b", Word, "\\b")))

相关问题