R语言 如何执行反向反连接?

z9gpfhce  于 2023-04-27  发布在  其他
关注(0)|答案(1)|浏览(123)

我有一个看起来像这样的变量:

我想让受害者的国籍“弹出”。所以“乌克兰国家实体”将单独显示为“乌克兰”。有700多个条目,所有不同的国家,所以比利时,缅甸等...
我没有文本挖掘的经验(老实说R也没有),所以我用了我在课堂上看到的东西,并试图把它们放在一起。
以下是我的推理:
1.将“受害者”分成单独的单词:

d_tokenized = state_cyberattacks_csv %>%
  filter(Category == 'Government')%>%
  select(Date, Sponsor, Victims) %>%
  unnest_tokens(word, Victims)

1.删除未出现在Demonym数据框的“Demonym”列中的单词

d_tokenized_s = d_tokenized %>%
  anti_join(demonym_list, by != "Demonym")

我知道它不工作,由于“!=”,因为它没有意义。我试图找到其他方法,使用join,str_extract,str_subset等...但我不明白他们在做什么,老实说。
我应该使用哪个函数?
此外,有一个问题,直接有国家名称,而不是一个demonym的条目,这将是删除,如果我找到一种方法,使用类似于anti-join,删除不匹配的东西与“Demonym”。

w8f9ii69

w8f9ii691#

library(tidyverse)

df <- structure(list(Victims = c("Ukrainian state entities", "Russian and Belarusian websites were targeted, including th...", 
                                 "Belgian Federal Public Service Interior", "Ukrainian government agencies", 
                                 "Government agencies of EU member states", "Two research institutes run by Rostec", 
                                 "Albanian government networks", "Cryptocurrency applications", 
                                 "VMware Horizon servers", "Cryptocurrency company employees", 
                                 "Individual suspects within Canadian police investigations.")), class = "data.frame", row.names = c(NA, 
                                                                                                                                     -11L))

如果只有“乌克兰国家实体”需要更换。

df |> mutate(Victims = str_replace(Victims, "Ukrainian state entities", "Ukraine"))
#>                                                           Victims
#> 1                                                         Ukraine
#> 2  Russian and Belarusian websites were targeted, including th...
#> 3                         Belgian Federal Public Service Interior
#> 4                                   Ukrainian government agencies
#> 5                         Government agencies of EU member states
#> 6                           Two research institutes run by Rostec
#> 7                                    Albanian government networks
#> 8                                     Cryptocurrency applications
#> 9                                          VMware Horizon servers
#> 10                               Cryptocurrency company employees
#> 11     Individual suspects within Canadian police investigations.

如果所有带“乌克兰语”的都需要更换

df |> mutate(Victims = case_when(
  str_detect(Victims, "Ukrainian") ~ "Ukraine",
  TRUE ~ Victims)
)
#>                                                           Victims
#> 1                                                         Ukraine
#> 2  Russian and Belarusian websites were targeted, including th...
#> 3                         Belgian Federal Public Service Interior
#> 4                                                         Ukraine
#> 5                         Government agencies of EU member states
#> 6                           Two research institutes run by Rostec
#> 7                                    Albanian government networks
#> 8                                     Cryptocurrency applications
#> 9                                          VMware Horizon servers
#> 10                               Cryptocurrency company employees
#> 11     Individual suspects within Canadian police investigations.

创建于2023-04-21使用reprex v2.0.2

相关问题