R语言替换另一栏中给出的单词后的所有内容

iklwldmw 于 2023-03-27 发布在其他

关注(0)|答案(4)|浏览(124)

我有一个 Dataframe ，看起来像这样：
| 字符串|字|
| - ------|- ------|
| 美味红苹果1号|苹果|
| 美味的红苹果和香蕉|苹果|
| 美味的香蕉、苹果和桃子|苹果|
| 美味香蕉和桃子|香蕉|
| 美味的桃子和苹果|桃子|
我想删除Word列中给定单词后面的所有单词，并保留该单词。
| 字符串|字|之后|
| - ------|- ------|- ------|
| 美味红苹果1号|苹果|美味的红苹果|
| 美味的红苹果和香蕉|苹果|美味的红苹果|
| 美味的香蕉、苹果和桃子|苹果|美味的香蕉苹果|
| 美味香蕉和桃子|香蕉|美味香蕉|
| 美味的桃子和苹果|桃子|鲜桃|
有人知道怎么做吗？

string <- с("tasty red apple number 1", "tasty red apple and banana", "tasty banana and apple and peach", "tasty banana and peach", "tasty peach and apple")
word <- c("apple", "apple", "apple", "banana", "peach")

来源：https://stackoverflow.com/questions/75848998/replace-everything-after-word-given-in-the-other-column

4条答案

按热度按时间

pbpqsu0x1#

我们可以捕获字符（(...)）直到'Word'作为一个组，然后在replacement（str_replace）中使用捕获组的反向引用（\\1）。.*表示我们丢弃的其余字符。str_replace也被向量化以进行替换，因此我们不需要任何循环

library(dplyr)
library(stringr)
df1 %>%
   mutate(After = str_replace(String, sprintf("(.*%s).*", Word), "\\1"))

输出

String   Word                  After
1         tasty red apple number 1  apple        tasty red apple
2       tasty red apple and banana  apple        tasty red apple
3 tasty banana and apple and peach  apple tasty banana and apple
4           tasty banana and peach banana           tasty banana
5            tasty peach and apple  peach            tasty peach

数据

df1 <- structure(list(String = c("tasty red apple number 1",
 "tasty red apple and banana", 
"tasty banana and apple and peach", "tasty banana and peach", 
"tasty peach and apple"), Word = c("apple", "apple", "apple", 
"banana", "peach")), class = "data.frame", row.names = c(NA, 
-5L))

赞(0）回复(0）举报 2023-03-27

2w2cym1i2#

在mapply中对gsub使用lookbehind以删除字符串中不需要的部分。

transform(dat, After=mapply(\(x, y) gsub(sprintf('(?<=%s).*',  x), '', y, perl=TRUE), Word, String))
#                             String   Word                  After
# 1         tasty red apple number 1  apple        tasty red apple
# 2       tasty red apple and banana  apple        tasty red apple
# 3 tasty banana and apple and peach  apple tasty banana and apple
# 4           tasty banana and peach banana           tasty banana
# 5            tasty peach and apple  peach            tasty peach

数据：*

dat <- structure(list(String = c("tasty red apple number 1", "tasty red apple and banana", 
"tasty banana and apple and peach", "tasty banana and peach", 
"tasty peach and apple"), Word = c("apple", "apple", "apple", 
"banana", "peach")), class = "data.frame", row.names = c(NA, 
-5L))

赞(0）回复(0）举报 2023-03-27

ttvkxqim3#

试试这个：

df1 %>%
  mutate(After = str_replace(String, str_c("(.*\\b", Word, "\\b).*"), "\\1"))
                            String   Word                  After
1         tasty red apple number 1  apple        tasty red apple
2       tasty red apple and banana  apple        tasty red apple
3 tasty banana and apple and peach  apple tasty banana and apple
4           tasty banana and peach banana           tasty banana
5            tasty peach and apple  peach            tasty peach

在这里我们（i）将Word Package 到字边界\\b中以防止包含Word值的较大字（例如，“dapple”和“apple”）被匹配。（ii）我们将该子串用括号括起来，以将其强制到捕获组中，然后我们（iii）在str_replace替换自变量中引用，而捕获组（.*）之后的任何内容都被省略。

赞(0）回复(0）举报 2023-03-27

ohfgkhjo4#

您也可以使用str_extract而不是str_replace来获得稍微简单的语法：

df1 |>
  mutate(After = str_extract(String, str_c(".*\\b", Word, "\\b")))

赞(0）回复(0）举报 2023-03-27

我来回答

R语言替换另一栏中给出的单词后的所有内容

4条答案

数据

相关问题

热门标签

最新问答

R语言 替换另一栏中给出的单词后的所有内容

4条答案

数据

相关问题

热门标签

最新问答

R语言替换另一栏中给出的单词后的所有内容