在dplyr中将函数的向量输出简洁地分配给多个变量

jaxagkaj 于 2023-02-14 发布在其他

关注(0)|答案(3)|浏览(89)

我试图在一次操作中（或者至少尽可能简洁地）将函数的向量输出（即长度大于1）分配给多个列。
以range()函数为例，它返回一个长度为2的数值向量，分别表示最小值和最大值，假设我想计算每组的range()，并将输出分配给min和max两列。
我目前的方法是结合summarize，然后手动添加一个键，然后重新调整为宽格式：

library(magrittr)

# create data
df <- dplyr::tibble(group = rep(letters[1:3], each = 3),
                    x = rpois(9, 10))

df
#> # A tibble: 9 x 2
#>   group     x
#>   <chr> <int>
#> 1 a         8
#> 2 a        12
#> 3 a         8
#> 4 b         9
#> 5 b        14
#> 6 b         9
#> 7 c        11
#> 8 c         6
#> 9 c        12

# summarize gives two lines per group
range_df <- df %>% 
  dplyr::group_by(group) %>% 
  dplyr::summarize(range = range(x)) %>% 
  dplyr::ungroup()

range_df
#> # A tibble: 6 x 2
#>   group range
#>   <chr> <int>
#> 1 a         8
#> 2 a        12
#> 3 b         9
#> 4 b        14
#> 5 c         6
#> 6 c        12

# add key and reshape
range_df %>% 
  dplyr::mutate(key = rep(c("min", "max"), 3)) %>% 
  tidyr::pivot_wider(names_from = key, values_from = range)
#> # A tibble: 3 x 3
#>   group   min   max
#>   <chr> <int> <int>
#> 1 a         8    12
#> 2 b         9    14
#> 3 c         6    12

有没有更优雅/简洁的替代方案？

- 编辑：**

理想情况下，替代解决方案可以处理任意数量的输出（例如，如果函数返回长度为3的输出，则应创建3个变量）。

来源：https://stackoverflow.com/questions/75413658/concisely-assign-vector-output-of-a-function-to-multiple-variables-in-dplyr

3条答案

按热度按时间

8ftvxx2r1#

# Writw a small function that does the job:

library(tidyverse)
f <- function(x){
  setNames(data.frame(t(range(x))), c('min', 'max'))
}

df %>%
  summarise(across(x, f, .unpack = TRUE), .by=group)
#> # A tibble: 3 × 3
#>   group x_min x_max
#>   <chr> <int> <int>
#> 1 a        10    13
#> 2 b         7    10
#> 3 c        10    12

如果您使用的是旧版本的dplyr

df %>%
  group_by(group)%>%
  summarise(across(x, f))%>%
  unpack(x)
#> # A tibble: 3 × 3
#>   group   min   max
#>   <chr> <int> <int>
#> 1 a         6     9
#> 2 b         7    12
#> 3 c         6    10

赞(0）回复(0）举报 2023-02-14

bvpmtnay2#

set.seed(1)

df <- dplyr::tibble(group = rep(letters[1:3], each = 3),
                    x = rpois(9, 10))

函数

g <- function(x){
      data.frame(min = min(x), max = max(x))
    }

呼叫g：

df %>%
  group_by(group) %>%
  summarise(across(x, g,  .unpack = TRUE))

赞(0）回复(0）举报 2023-02-14

py49o6xq3#

基于onyambu的回答，我为此构建了一个小型的泛型函数，可能在某些极端情况下，这个函数不起作用。

out2col <- function(x, fun, out_names = c(), add_args = list()) {
    tmp <- do.call(what = fun, args = c(list(x), add_args))
    out <- data.frame(t(tmp))
    if (length(out_names) != 0) {
      if (length(tmp) != length(out_names)) {
        stop("provided names did not match the number of outputs")
      }
      out <- setNames(object = out, nm = out_names)
    } 
    return(out)
}

不带任何附加参数的示例：

df %>%
  summarise(across(x, out2col, .unpack = TRUE, fun = range),
        .by=group)

输出：

# A tibble: 3 × 3
  group  x_X1  x_X2
  <chr> <int> <int>
1 a         7    10
2 b        11    14
3 c         9    14

带有附加参数的示例：

df %>%
  summarise(across(x, out2col, .unpack = TRUE, fun = quantile,
                   out_names = c("min", "max", "Q25"),
                   add_args = list(probs = c(0, 1, 0.25))
                   ),
            .by=group)

输出：

# A tibble: 3 × 4
  group x_min x_max x_Q25
  <chr> <dbl> <dbl> <dbl>
1 a         7    10   7.5
2 b        11    14  11.5
3 c         9    14  10

赞(0）回复(0）举报 2023-02-14

我来回答

在dplyr中将函数的向量输出简洁地分配给多个变量

3条答案

相关问题

热门标签

最新问答