R语言 从列表列中删除重复字符串

rvpgvaaj  于 2023-02-27  发布在  其他
关注(0)|答案(3)|浏览(199)

我有这个 Dataframe :

structure(list(class = c("Großbrittanien", "Rest Europa"), countries = list(
    c("United Kingdom", "United Kingdom"), "Spain")), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))

它看起来像这样:

我想把countries-list列变成字符列。我想删除重复的条目。这样英国只出现一次。我有点困惑我怎么能用dplyr语法来实现这一点。

w8rqjzmb

w8rqjzmb1#

您可以取消嵌套countries,然后删除重复的行。

library(tidyverse)

df %>%
  unnest(countries) %>%
  distinct()

# # A tibble: 2 × 2
#   class          countries     
#   <chr>          <chr>         
# 1 Großbrittanien United Kingdom
# 2 Rest Europa    Spain
gmxoilav

gmxoilav2#

或者不使用unnest,在转换为字符串之前按类使用unique

    • 分组:**
library(dplyr)

df |>
  mutate(countries = toString(unique(unlist(countries))), .by = class)

# Note: If you're using `dplyr < v.1.1.0`, use `group_by`/`ungroup`.
    • 使用purrr:**
library(dplyr)
library(purrr)

df |>
  mutate(countries = map_chr(countries, ~ toString(unique(.))))
    • 输出:**
# A tibble: 2 × 2
  class          countries     
  <chr>          <chr>         
1 Großbrittanien United Kingdom
2 Rest Europa    Spain, Portugal
    • 数据(包括不重复的内容..葡萄牙):**
df <- 
  structure(list(class = c("Großbrittanien", "Rest Europa"), countries = list(
    c("United Kingdom", "United Kingdom"), c("Spain", "Portugal"))), row.names = c(NA, 
                                                                    -2L), class = c("tbl_df", "tbl", "data.frame"))
gpfsuwkq

gpfsuwkq3#

以下是使用dplyr语法的版本

library(dplyr)

df %>%
  unnest(countries) %>%
  distinct(class, countries) %>%
  group_by(class) %>%
  summarise(countries = paste(countries, collapse = ", "))

相关问题