Dplyr使用带参数的多个函数进行总结,未给予预期结果

6tr1vspr  于 2023-05-26  发布在  其他
关注(0)|答案(3)|浏览(145)

使用此函数使所有汇总函数使用na.rm参数True

df %>% summarise_if(is.numeric, list(mean = ~mean(., na.rm = T), 
                                         sd = ~sd(., na.rm = T),
                                         median = ~median(., na.rm = T), 
                                         min = ~min(., na.rm = T), 
                                         max = ~max(., na.rm = T))) %>% t()

我希望它能给予这样的结果:

但它给出了2列数据,例如:

为什么要这样实施?有没有一个优雅的解决方案?

xjreopfe

xjreopfe1#

我发现下面的代码更有效率,因为
1.只能调用一次na.rm参数;
1.你只需要一个旋转操作。

df %>%
  summarise(across(is.numeric,
                   list("mean" = mean,
                        "sd" = sd,
                        "median" = median,
                        "min" = min,
                        "max" = max),
                   .names = "{.fn}_{.col}",
                   na.rm = T)) %>%
  pivot_longer(everything(),
               names_sep = "_",
               names_to = c(".value", "column"))
qv7cva1a

qv7cva1a2#

使用这种代码可以解决问题,但无论如何都不优雅。

df %>% summarise_if(is.numeric, list(mean = ~mean(., na.rm = T), 
                                      sd = ~sd(., na.rm = T),
                                      median = ~median(., na.rm = T), 
                                      min = ~min(., na.rm = T), 
                                      max = ~max(., na.rm = T)))) %>%
  pivot_longer(everything(), names_to = "func", values_to = "value") %>% 
  separate(col = func, into = c("column", "function_type"), sep = "_") %>% 
  pivot_wider(id_cols = column, values_from = value, names_from = function_type)
laik7k3q

laik7k3q3#

另一个主意。很明显,最终结果还有一些需要改进的地方(例如,列名)。
我认为可以制作一个 Dataframe 友好的t()版本,但仍然使用pivot_longer()pivot_wider();很难避免这些功能。

library(readr)
library(dplyr, warn.conflicts = FALSE)

df <- read_csv(paste0("https://raw.githubusercontent.com/cran/",
                      "rattle/master/inst/csv/weather.csv"),
               show_col_types = FALSE)

get_summary <- function(x) {
  list(mean = mean(x, na.rm = TRUE), 
       sd = sd(x, na.rm = TRUE),
       median = median(x, na.rm = TRUE), 
       min = min(x, na.rm = TRUE), 
       max = max(x, na.rm = TRUE))
}

df |>
  reframe(across(where(is.numeric), get_summary)) |>
  t()
#>               [,1]     [,2]     [,3]    [,4]  [,5]  
#> MinTemp       7.265574 6.0258   7.45    -5.3  20.9  
#> MaxTemp       20.55027 6.690516 19.65   7.6   35.8  
#> Rainfall      1.428415 4.2258   0       0     39.8  
#> Evaporation   4.521858 2.669383 4.2     0.2   13.8  
#> Sunshine      7.909366 3.481517 8.6     0     13.6  
#> WindGustSpeed 39.84066 13.05981 39      13    98    
#> WindSpeed9am  9.651811 7.951929 7       0     41    
#> WindSpeed3pm  17.98634 8.856997 17      0     52    
#> Humidity9am   72.03552 13.13706 72      36    99    
#> Humidity3pm   44.51913 16.85095 43      13    96    
#> Pressure9am   1019.709 6.686212 1020.15 996.5 1035.7
#> Pressure3pm   1016.81  6.469422 1017.4  996.8 1033.2
#> Cloud9am      3.89071  2.956131 3.5     0     8     
#> Cloud3pm      4.02459  2.666268 4       0     8     
#> Temp9am       12.35847 5.630832 12.55   0.1   24.7  
#> Temp3pm       19.23087 6.640346 18.55   5.1   34.5  
#> RISK_MM       1.428415 4.2258   0       0     39.8

创建于2023-05-20使用reprex v2.0.2

相关问题