为什么在列表列上使用c()不能与dplyr summarize一起使用?

gopyfrb3  于 2023-04-03  发布在  其他
关注(0)|答案(1)|浏览(115)

我有一个列表列,我想对每个组使用c()来将这些列表合并到summarize中。这应该会导致每个组一行,但它没有(注意代码是使用dplyr〉= 1.1.0编写的):

library(dplyr)

df <- tibble::tibble(group = c("A", "A", "B"),
                     list_col = list(list("One"), list("Two"), list("Three")))

df |> 
  summarize(list_col = c(list_col),
            .by = group)

这将返回:

group list_col  
  <chr> <list>    
1 A     <list [1]>
2 A     <list [1]>
3 B     <list [1]>
Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
i Please use `reframe()` instead.
i When switching from `summarise()` to `reframe()`, remember that `reframe()` always
  returns an ungrouped data frame and adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

预期产出

output <- tibble::tibble(group = c("A", "B"),
               list_col = list(list("One", "Two"), list("Three")))

  group list_col  
  <chr> <list>    
1 A     <list [2]>
2 B     <list [1]>

output$list_col[[1]]
[[1]]
[1] "One"

[[2]]
[1] "Two"

备选方案

你可以做一些类似下面的代码。但是A)它改变了列的行类型,B)我想具体知道为什么c()不起作用:

df |>
  summarize(list_col = list(unlist(list_col)),
            .by = group)

  group list_col 
  <chr> <list>   
1 A     <chr [2]>
2 B     <chr [1]>

在第一个组(A)中,我希望发生以下情况,将两个列表合并为一个列表:

c(list("One"), list("Two"))
[[1]]
[1] "One"

[[2]]
[1] "Two"

那么,为什么这不起作用呢?这是一个bug还是有什么语法我遗漏了?

wgx48brx

wgx48brx1#

library(dplyr)
out <- df %>% 
  reframe(list_col = list(as.list(unlist(list_col))), .by = group)
  • 输出
> out
# A tibble: 2 × 2
  group list_col  
  <chr> <list>    
1 A     <list [2]>
2 B     <list [1]>
> out$list_col[[1]]
[[1]]
[1] "One"

[[2]]
[1] "Two"

-OP的预期

> output$list_col[[1]]
[[1]]
[1] "One"

[[2]]
[1] "Two"

关于cunlist之间的差异,对于recursive,默认参数为FALSE/TRUE
c(...,recursive = FALSE,use.names = TRUE)
unlist(x,recursive = TRUE,use.names = TRUE)
即基本区别是

> c(list("a"))
[[1]]
[1] "a"

> unlist(list("a"))
[1] "a"

对于两个以上的元素,单个list的,...可变参数长度仅为1,因为它是传入c的单个列表。

> c(list("a", "b"))
[[1]]
[1] "a"

[[2]]
[1] "b"

c没有做任何事情,除非我们将它与do.call一起使用,其中list的每个元素都作为单独的参数传递

> do.call(c, list("a", "b"))
[1] "a" "b"

以OP为例

> df$list_col[1:2]
[[1]]
[[1]][[1]]
[1] "One"

[[2]]
[[2]][[1]]
[1] "Two"
> c(df$list_col[1:2])
[[1]]
[[1]][[1]]
[1] "One"

[[2]]
[[2]][[1]]
[1] "Two"

> do.call(c, df$list_col[1:2])
[[1]]
[1] "One"

[[2]]
[1] "Two"

即如果我们这样做

out2 <- df %>% 
  reframe(list_col = list(do.call(c, list_col)), .by = group)
  • 输出
> out2$list_col[[1]]
[[1]]
[1] "One"

[[2]]
[1] "Two"

相关问题