R语言 检查另一列中是否存在子字符串

czq61nw1  于 2023-03-20  发布在  其他
关注(0)|答案(1)|浏览(169)

我有一个包含唯一电子邮件域的列。(例如,@comp1.com、@comp2.com...)
我有另一个数据集与电子邮件列,其中包括许多电子邮件。一些域将在域df的prensent,有些不会。
我想创建一个新列“目标电子邮件”,如果该电子邮件是这些目标域的一部分,它将返回TRUE,否则返回FALSE。
我试过:

df$target_email<-grepl(domain$Email, df$Email)

df$target_email<-ifelse(grepl(domain$Email, df$Email), "TRUE", "FALSE")

df$target_email<-sapply(domain$Email, \(string) any(grepl(string, df$target_email, fixed = TRUE)))

这些都返回错误:

argument 'pattern' has length > 1 and only the first element will be used

replacement has 160 rows, data has 28446

编辑:假设我们要隔离属于FAANG公司的电子邮件

df$email<-c("matt@apple.com", "tash@amazon.com", "a@coke.com", "b@netflix.com", "c@pepsi.com")

domains$email<-c("apple.com", "netflix.com", "amazon.com", "google.com")

I want:
df$target_email<-c("True", "True", "False", "True", "False")
dbf7pr2w

dbf7pr2w1#

library(tidyverse)

domains <- c("apple.com", "netflix.com", "amazon.com", "google.com")

df <- tibble(
  email = c("matt@apple.com", "tash@amazon.com", "a@coke.com", "b@netflix.com", "c@pepsi.com")
)

pattern <- str_flatten(domains, collapse = "$|")

df |> 
  mutate(target_email = str_detect(email, pattern))
#> # A tibble: 5 × 2
#>   email           target_email
#>   <chr>           <lgl>       
#> 1 matt@apple.com  TRUE        
#> 2 tash@amazon.com TRUE        
#> 3 a@coke.com      FALSE       
#> 4 b@netflix.com   TRUE        
#> 5 c@pepsi.com     FALSE

创建于2023年3月20日,使用reprex v2.0.2

相关问题