R语言 如何使用跨函数重写相同的代码

tp5buhyn  于 2023-02-06  发布在  其他
关注(0)|答案(4)|浏览(196)

我编写了以下代码

out %>% group_by(tests0, GROUP) %>% 
  summarise(
            mean0 = mean(score0, na.rm = T),
            stderr0 = std.error(score0, na.rm = T), 
            mean7 = mean(score7, na.rm = T), 
            stederr7 = std.error(score7, na.rm = T),
            diff.std.mean = t.test(score0, score7, paired = T)$estimate, 
            p.value = t.test(score0, score7, paired = T)$p.value, 
            )

我得到了以下输出

tests0     GROUP    mean0 stderr0 mean7 stederr7 diff.std.mean p.value
   <fct>      <fct>    <dbl>   <dbl> <dbl>    <dbl>         <dbl>   <dbl>
 1 ADAS_CogT0 CONTROL   12.6   0.525  13.6    0.662        -1.15  0.00182
 2 ADAS_CogT0 TRAINING  14.0   0.613  12.6    0.570         1.40  0.00295
 3 PVF_T0     CONTROL   32.1   1.22   31.3    1.45          0.498 0.636  
 4 PVF_T0     TRAINING  31.6   1.37   34.3    1.51         -2.48  0.0102 
 5 ROCF_CT0   CONTROL   29.6   0.893  30.3    0.821        -0.180 0.835  
 6 ROCF_CT0   TRAINING  30.1   0.906  29.5    0.929         0.489 0.615  
 7 ROCF_IT0   CONTROL   12.8   0.563  12.2    0.683         0.580 0.356  
 8 ROCF_IT0   TRAINING  10.9   0.735  12.3    0.768        -1.44  0.0238 
 9 ROCF_RT0   CONTROL   12.1   0.725  12.5    0.797        -0.370 0.598  
10 ROCF_RT0   TRAINING  10.5   0.746  10.9    0.742        -0.534 0.370  
11 SVF_T0     CONTROL   35.5   1.05   34      1.15          1.42  0.107  
12 SVF_T0     TRAINING  34.1   1.04   32.9    1.16          0.962 0.231

如果我想通过across function做同样的事情,我应该怎么做才能达到同样的结果,显示在上面的代码?Actaully我遇到了麻烦,因为我从这个问题Reproduce a complex table with double headesrs下发布的答案中画了一些例子,但我不能正确地适合它。
这里是数据集
下面你可以找到我想得到相同的方法。它是一个需要.x操作的方法。

out %>%    
group_by(across(all_of(tests0, GROUP))) %>%    summarise(across(starts_with('score'),                         
list(mean = ~ mean(.x,na.rm = T),            
stderr = ~ std.error(.x, na.rm = TRUE),            
diff.std.mean = ~ t.test(.x, na.rm = T)))$estimate,              
p.value = ~ t.test(.x, na.rm = T)))$p.value)),.groups = "drop")
xxe27gdn

xxe27gdn1#

可以在across()中使用参数.names

library(dplyr)
out %>%
  group_by(tests0, GROUP) %>%
  summarize(across(c(score0, score7), sd, na.rm = TRUE, .names = "sd_{.col}"), 
            across(c(score0, score7), mean, na.rm = TRUE, .names = "mean_{.col}"), 
            diff.std.mean = t.test(score0, score7, paired = T)$estimate, 
            p.value = t.test(score0, score7, paired = T)$p.value) %>%
  ungroup()
#> `summarise()` has grouped output by 'tests0'. You can override using the `.groups` argument.
#> # A tibble: 2 x 8
#>   tests0 GROUP sd_score0 sd_score7 mean_score0 mean_score7 diff.std.mean p.value
#>   <fct>  <fct>     <dbl>     <dbl>       <dbl>       <dbl>         <dbl>   <dbl>
#> 1 ADAS_~ CONT~      3.72      4.81        12.5        13.5         -1.24 0.00471
#> 2 ADAS_~ TRAI~      4.55      4.15        14.0        12.6          1.40 0.00295

reprex package(v2.0.1)于2021年11月26日创建

    • 编辑**

如果你更喜欢列表,那么确定各个部分然后将它们绑定在一起会更容易:

library(data.table)
by <- c("tests0", "GROUP")
out_dt <- data.table::data.table(out)
means <- out_dt[, sapply(.SD, function(x) list(mean = mean(x, na.rm = TRUE))), 
                by = by, .SDcols = patterns("^score")]
sds <- out_dt[, sapply(.SD, function(x) list(sd = sd(x, na.rm = TRUE))), 
                by = by, .SDcols = patterns("^score")]
t_est <- out_dt[, .(diff.std.mean = t.test(score0, score7, paired = T)$estimate), by = by]
tpvalue <- out_dt[, .(p.value = t.test(score0, score7, paired = T)$p.value), by = by] 
list(means = means, sds = sds, diff.std.mean = t_est, p.value = tpvalue)
umuewwlo

umuewwlo2#

下面是您可能要考虑的另一种方法。首先,我将您的代码剪切并粘贴到一个函数中。唯一的更改是提取列名并删除对计算标准误差的Plotrix包的依赖。

g <- function (df)
{
  nms <- c(names(df)[1:2],
           paste0('mean',   sub(".*[a-z]","",names(df)[3])),
           paste0('stderr', sub(".*[a-z]","",names(df)[3])),
           paste0('mean',   sub(".*[a-z]","",names(df)[4])),
           paste0('stderr', sub(".*[a-z]","",names(df)[4])),
           'diff.std.mean', 'p.value')
  
  z <- df %>% group_by(df[,1:2]) %>%
    summarize(
    x1 = mean(pull(df[,3]), na.rm = T),
    x2 = sd(pull(df[,3]), na.rm=T) / sqrt(sum(!is.na(pull(df[,3])))),
    x3 = mean(pull(df[,4]), na.rm = T),
    x4 = sd(pull(df[,4]), na.rm=T) / sqrt(sum(!is.na(pull(df[,4])))),
    x5 = t.test(pull(df[,3]), pull(df[,4]), paired = T)$estimate, 
    x6 = t.test(pull(df[,3]), pull(df[,4]), paired = T)$p.value)
    colnames(z) <- nms
    return(z)
}

然后,由于测试数据只有一个因子水平,并且样本大小不足以满足您使用的plotrix::std.error函数,因此我在'test0'因子中引入了变量,将样本大小增加了一倍,并删除了未使用的水平,因为它们会导致空帧上的迭代。此外,我还添加了score8以显示如何对其他变量运行。

s <- t %>% mutate(tests0 = case_when(Education <= 8 ~ 'ADAS_CogTO', T ~ 'PVF_T0'),
                  score8 = score0 + score7)
q <- rbind(s, s)  
fct_drop(q$tests0)

然后,我按因子水平拆分框架,将函数应用于每个拆分,然后将数据重新合并到一个函数中,该函数允许您操作得分和组变量。我假设每个变量为2,这对于得分变量是安全的,因为您正在进行配对t检验,并且它很容易扩展为组变量(如果您只是将score变量移到位置1和2,并将传递给函数的所有剩余变量用作组变量)。

h <- function(df, group_vars, score_vars)
{
  z <- df %>% select(group_vars, score_vars) 
  z <- z  %>% group_by(z[,1:2]) %>%
              group_map( ~ g(.x), .keep = T) %>%
              bind_rows()
}

请注意,如果您希望将此应用于其他数据,您只需要更改传递给组和分数变量的列。如果您也想更改,这应该是相当容易的。我只是认为这是一个很好的框架,适合您正在尝试做的事情。考虑一下如何处理test0为空而test7为非空的情况(反之亦然),因为这些观察结果包括在汇总统计中,但必须排除在t检验之外。祝你好运。

x <- h(q, c("tests0", "GROUP"), c("score0", "score7")) %>%
 group_by(tests0) %>%
 pivot_wider(id_cols     = tests0,
             names_from  = GROUP,
             values_from = c("mean0","stderr0","mean7","stderr7",
                             'diff.std.mean', 'p.value'))
svgewumm

svgewumm3#

我没有一个叫std.error的函数,所以我使用了sd,当然你可以修改它。

library(dplyr)
library(readr)

out %>% 
  group_by(tests0, GROUP) %>% 
  summarise(
    across(c(score0, score7), list(mean = mean, stderr = sd), na.rm = TRUE,
           .names = '{.fn}{parse_number(.col)}'),
    with(t.test(score0, score7, paired = T),
         tibble(diff.std.mean = estimate,
                p.value)))
# # A tibble: 2 × 8
#   tests0     GROUP    mean0 stderr0 mean7 stderr7 diff.std.mean p.value
#   <fct>      <fct>    <dbl>   <dbl> <dbl>   <dbl>         <dbl>   <dbl>
# 1 ADAS_CogT0 CONTROL   12.5    3.72  13.5    4.81         -1.24 0.00471
# 2 ADAS_CogT0 TRAINING  14.0    4.55  12.6    4.15          1.40 0.00295

实际上,我会把上面的代码放在一个函数中,该函数带有xy参数,然后运行fun(df, x = score0, y = score7) .但是,为了好玩,如果你必须使用.x.y,这里有一个方法(尽管我认为这样做有点愚蠢)

df %>% 
  group_by(tests0, GROUP) %>% 
  select(starts_with('score')) %>% 
  summarise(
    across(everything(), list(mean = mean, stderr = sd), na.rm = TRUE,
           .names = '{.fn}{parse_number(.col)}'),
    across(everything(), list(list)) %>% 
      pmap_dfr(~ t.test(.x, .y, paired = TRUE)[c('estimate', 'p.value')]) %>% 
      transmute(diff.std.mean = estimate, p.value))

# # A tibble: 2 × 8
# # Groups:   tests0 [1]
#   tests0     GROUP    mean0 stderr0 mean7 stderr7 diff.std.mean p.value
#   <fct>      <fct>    <dbl>   <dbl> <dbl>   <dbl>         <dbl>   <dbl>
# 1 ADAS_CogT0 CONTROL   12.5    3.72  13.5    4.81         -1.24 0.00471
# 2 ADAS_CogT0 TRAINING  14.0    4.55  12.6    4.15          1.40 0.00295
gmxoilav

gmxoilav4#

我想到了一个可行的解决方案(这可能有帮助,也可能没有帮助)通过"手动"使用across(),而不是一次应用一列函数。结果输出是一个data.frame,其中的list列被深深地嵌套,所以unnest()会派上用场。我还使用possibly()来解决两列不存在的情况,请记住,across()可以匹配任意数量的列,并且t.test()需要xy参数。
代码:

library(tidyverse)

data <-
  df %>%
  group_by(tests0, GROUP) %>%
  summarize(
    all = list(across(starts_with("score")) %>%
      {
        tibble(
          ttest   = data.frame(possibly(~ reduce(., ~ t.test(.x, .y, paired = TRUE))[c("estimate", 'p.value')], NA)(.)),
          means   = data.frame(map(., ~ mean(.x, na.rm = TRUE)) %>% set_names(., str_replace(names(.), "\\D+", "mean"))),
          stderrs = data.frame(map(., ~ sd(.x, na.rm = TRUE)) %>% set_names(., str_replace(names(.), "\\D+", "stederr")))
        )
      })
  )
#> `summarise()` has grouped output by 'tests0'. You can override using the `.groups` argument.

data %>%
  unnest(all) %>%
  unnest(-c("tests0", "GROUP"))
#> # A tibble: 2 × 8
#> # Groups:   tests0 [1]
#>   tests0     GROUP    estimate p.value mean0 mean7 stederr0 stederr7
#>   <fct>      <fct>       <dbl>   <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 ADAS_CogT0 CONTROL     -1.24 0.00471  12.5  13.5     3.72     4.81
#> 2 ADAS_CogT0 TRAINING     1.40 0.00295  14.0  12.6     4.55     4.15

reprex package(v2.0.1)于2021年11月29日创建

相关问题