R语言 如何使用apply或sapply执行多个测试?

avwztpqn  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(95)

我的目标是用更少的时间做卡方检验。

data <- data.frame(
  sex = factor(c("M", "F", "M")),
  ageid = factor(c(8, 6, 7)),
  married = factor(c(2, 1, 2)),
  cagv_typ = factor(c("non-primary", "primary", "non-primary")),
  sq5_1 = factor(c(1, 1, 1)),
  sq5_2 = factor(c(0, 1, 0))
)

其中,性别和已婚是变量,其余是结果。实际上,我有超过10个结果变量和5个亚组变量。
首先,我根据这里显示的示例编写了以下代码https://epirhandbook.com/en/simple-statistical-tests.html#chi-squared-test-1

library(rstatix)
chis_test <- function(data, var1, var2){
  result <- data %>%
    tabyl({{var1}}, {{var2}}) %>%
    select(-1) %>% 
    chisq_test()
  return(result)
}

接下来,我尝试使用expand_grid()获取所有可能的组合:

combo <- expand_grid(x = names(data)[c(1, 3)], y = names(data)[-c(1, 3)])

结果如下(其他实际变量也显示):

x          y
1      cagv_typ      ageid
2           sex      ageid
3   cset_typ_bi      ageid
4     lv_eas_bi      ageid
5    und_con_bi      ageid
6    sup_ard_bi      ageid
7    job_inf_bi      ageid
8      cagv_typ    married
9           sex    married
10  cset_typ_bi    married
11    lv_eas_bi    married
12   und_con_bi    married
13   sup_ard_bi    married
14   job_inf_bi    married

我还尝试了sex和cagv_tpy的一个组合:

chis_test(sq_catvar, sex, cagv_typ)

它返回了我想要的结果:

n  statistic  p      df    method        p.signif 
267  55.8   7.87e-14  1 Chi-square test   ****

但是当我使用apply()时,它失败了:

apply(combo, 1, function(x) chis_test(data, x[1], x[2]))

我想知道出了什么问题。在此先谢谢您!
良好祝愿

dl5txlt9

dl5txlt91#

除了@Onyambu的评论之外,这里还有一个tidyverse方法(可能更容易理解):

library(purrr)
library(tidyr)

data <- data.frame(
  sex = factor(c("M", "F", "M")),
  ageid = factor(c(8, 6, 7)),
  married = factor(c(2, 1, 2)),
  cagv_typ = factor(c("non-primary", "primary", "non-primary")),
  sq5_1 = factor(c(1, 1, 1)),
  sq5_2 = factor(c(0, 1, 0))
)

var_names_x <- c("sex", "married")
var_names_y <- names(data)[!names(data) %in% var_names_x]
data_var_names <- tidyr::expand_grid(x_var = var_names_x, y_var = var_names_y)

purrr::map2(.x = data_var_names$x_var,
            .y = data_var_names$y_var,
            .f = ~chisq.test(table(data[[.x]], data[[.y]])))

编辑:你要求一种很好地提取p值的方法。为了做到这一点,我们可以保存map2()的结果并使用sapply()map_dbl()

res <- purrr::map2(.x = data_var_names$x_var,
                   .y = data_var_names$y_var,
                   .f = ~chisq.test(table(data[[.x]], data[[.y]])))

data_var_names$pval <- unlist(sapply(res, "[", "p.value"))
## OR:
data_var_names$pval <- map_dbl(res, "p.value")

导致:

> data_var_names
# A tibble: 8 x 3
  x_var   y_var     pval
  <chr>   <chr>    <dbl>
1 sex     ageid    0.223
2 sex     cagv_typ 0.665
3 sex     sq5_1    0.564
4 sex     sq5_2    0.665
5 married ageid    0.223
6 married cagv_typ 0.665
7 married sq5_1    0.564
8 married sq5_2    0.665

相关问题