如何避免在多参数的dplyr::mutate()调用中重复代码?

niknxzdl  于 2023-05-20  发布在  其他
关注(0)|答案(4)|浏览(103)

问题

我正在从基本R过渡到dplyr
为了遵守DRY(Don't Repeat Yourself)原则,我想缩短以下代码:

mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
                  x = rowMeans(select(., hp:wt), na.rm = TRUE),
                  y = rowMeans(select(., qsec:am), na.rm = TRUE),
                  z = rowMeans(select(., gear:carb), na.rm = TRUE))

mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
                                x = mean(hp:wt, na.rm = TRUE),
                                y = mean(qsec:am, na.rm = TRUE),
                                z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data

目标

目标是从单个调用计算 Dataframe 中不同尺度的均值。如您所见,rowMeansselectna.rm参数重复了几次(假设我有比本例多几个变量)。

尝试次数

I was trying试to come up with an across()解决方案,

mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))

但是它没有产生正确的结果,因为我不知道如何为w:z指定不同的列参数。使用文档示例中的c_across,我们回到重复代码:

mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
                                x = mean(c_across(hp:wt), na.rm = TRUE),
                                y = mean(c_across(qsec:am), na.rm = TRUE),
                                z = mean(c_across(gear:carb), na.rm = TRUE))

我很想求助于lapply或一个自定义函数,但我觉得这会破坏适应dplyr和新的across()参数的目的。

**编辑:**为了澄清,我想避免多次调用rowMeansselectna.rm
相关线程:123

qv7cva1a

qv7cva1a1#

我们不需要rowwise,而是使用select和矢量化的rowMeans。为了使这更容易,可以创建一个函数

f1 <- function(dat, nm1) {
          dat %>%
            select({{nm1}}) %>%
             rowMeans(na.rm = TRUE)
    }

mtcars %>% mutate(w = f1(dat = ., nm1 = mpg:disp),
                  x = f1(dat = ., nm1 = hp:wt),
                  y = f1(dat = ., nm1 = qsec:am),
                  z = f1(dat = ., nm1= gear:carb)  )
kqqjbcuj

kqqjbcuj2#

使用自定义函数(但组织方式略有不同,以减少重复代码)

mm <- function(data, new_col, cols_to_mut) {
    data %>%
        mutate(
            {{ new_col }} := mean(c_across({{ cols_to_mut }}), na.rm=TRUE)
        )
}

mtcars %>% 
    rowwise %>% 
    mm(w, mpg:disp) %>%
    mm(x, hp:wt) %>%
    mm(y, qsec:am) %>%
    mm(z, gear:carb) %>%
    ungroup
7y4bm7vi

7y4bm7vi3#

考虑使用purrr::reduce2来避免重复

mtcars %>% 
  reduce2(
    c("w","x", "y", "z"),
    c("mpg:disp", "hp:wt","qsec:am","gear:carb"),
    ~ ..1 %>% rowwise %>% mutate(!!..2 := mean(c_across(!!rlang::parse_expr(..3)), na.rm=TRUE)),
    .init = .)


# A tibble: 32 x 15
# Rowwise: 
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb     w     x     y     z
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4  62.3  38.8  5.82   4  
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4  62.3  38.9  6.01   4  
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1  44.9  33.1  6.87   2.5
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1  95.1  38.8  6.81   2  
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 129.   60.5  5.67   2.5
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1  83.0  37.1  7.07   2  
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 127.   83.9  5.28   3.5
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2  58.4  23.0  7      3  
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2  55.9  34.0  7.97   3  
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4  64.3  43.5  6.43   4  
# ... with 22 more rows
ryevplcw

ryevplcw4#

dplyr 1.1.0开始使用新的pick()函数的新的略短的解决方案:

library(dplyr)

mtcars %>% mutate(w = rowMeans(pick(mpg:disp), na.rm = TRUE),
                  x = rowMeans(pick(hp:wt), na.rm = TRUE),
                  y = rowMeans(pick(qsec:am), na.rm = TRUE),
                  z = rowMeans(pick(gear:carb), na.rm = TRUE)) %>% 
  head()
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb         w
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4  62.33333
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4  62.33333
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1  44.93333
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1  95.13333
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 128.90000
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1  83.03333
#>                          x        y   z
#> Mazda RX4         38.84000 5.820000 4.0
#> Mazda RX4 Wag     38.92500 6.006667 4.0
#> Datsun 710        33.05667 6.870000 2.5
#> Hornet 4 Drive    38.76500 6.813333 2.0
#> Hornet Sportabout 60.53000 5.673333 2.5
#> Valiant           37.07333 7.073333 2.0

说明:新的pick()函数现在允许我们避免像在select()中那样指定dot参数。
创建于2023-05-19带有reprex v2.0.2

相关问题