基于R中的特定类别列,从具有逗号分隔值的行中获取唯一值

m1m5dgzv  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(87)

假设我有:

group     X                 Y                   Z
A         cat, dog          dog, fox        
A         fox, chicken      dog, fox, chicken
A
B         fox, dog 
B         fox
B                                               bunny

我想根据这三列总结每组的独特动物,

group     animal
A         cat, dog, fox, chicken
B         bunny, dog, fox

动物的顺序并不重要,只要它们具有独特的价值。我试过:

df %>%  
group_by(group) %>% 
separate_rows("X", sep=",")   %>%  
distinct %>%    
summarise(X = toString(X))

这甚至对一列都不起作用(我得到了一堆NA)
我在想,

df %>%   group_by(group) %>%   summarise_at(vars("X","Y","Z"), sum)

它适用于不以逗号分隔的数值变量(每行一个值)

5n0oy7gb

5n0oy7gb1#

一种方法是:

df %>%
   pivot_longer(-group)%>%
   separate_rows(value)%>%
   summarise(animal = toString(unique(value[nzchar(value)])), .by = group)

# A tibble: 2 × 2
  group animal                
  <chr> <chr>                 
1 A     cat, dog, fox, chicken
2 B     fox, dog, bunny

在基础R中,你可以这样做:

fn <- function(x) {
  toString(unique(scan(text=x, what="", sep=',', strip.white = TRUE, quiet = TRUE)))
}
aggregate(values~group, cbind(df[1], stack(df,-1)), fn)

  group                 values
1     A cat, dog, fox, chicken
2     B        fox, dog, bunny
gtlvzcf8

gtlvzcf82#

这里有一个简单的方法(它假设空单元格是NA):

library(tidyr)
df %>%
  separate_rows(-group) %>%
  pivot_longer(-group, values_drop_na = TRUE) %>%
  group_by(group) %>%
  summarise(value = str_c(unique(value), collapse = ", "))

玩具数据:

df <- data.frame(
  group = c("A", "A", "B", "B"),
  X = c("cat, dog", "chicken","fox, dog", "fox"),
  Y = c("dog, fox", "fox, dog", NA, NA),
  Z = c(NA, NA, "bunny", NA)
)
vuktfyat

vuktfyat3#

我们也可以unite,然后在“,"上拆分,最后将unique值连接起来

library(tidyr)
library(dplyr)
library(stringr)

df |> 
    unite("animals",
          c(X, Y, Z),
          sep = ",",
          remove = TRUE,
          na.rm = TRUE
          ) |> 
    summarise(animals = str_split(animals, ",\\s*") |> 
                  unlist() |> 
                  unique() |> 
                  toString(),
              .by = group
              )

  group                animals
1     A cat, dog, fox, chicken
2     B        fox, dog, bunny

相关问题