R语言 如何自动化dunn_test和ggboxplot?

5lhxktic  于 2023-04-18  发布在  其他
关注(0)|答案(3)|浏览(180)

我正在处理一个 Dataframe (称为df),看起来像这样(出于实际原因在这里缩短):
| 观察到|香农|InvSimpson|均匀度|月|
| --------------|--------------|--------------|--------------|--------------|
| 六八八|4.553810|23.365814|0.6969632|二月|
| 七四九|4.381557|15.162467|0.6619927|二月|
| 六百一十|3.829187|11.178981|0.5970548|二月|
| 六六五|4.201113|16.284009|0.6463463|行军|
| 八三九|5.185554|小行星43|0.7702601|行军|
| 七五七|4.782258|31.011366|0.7213751|行军|
| 五一六|3.239304|4.765211|0.5186118|四月|
| ……|……|……|……|……|
我正在运行一个事后测试使用邓恩的测试,然后添加xy位置,并绘制一切作为箱线图。它的工作,但我的代码是非常重复...

library(rstatix)

obs_dunn <- dunn_test(Observed ~ Month, data=df, p.adjust.method="BH")
obs_dunn <- obs_dunn %>% arrange(p.adj)
obs_dunn <- obs_dunn %>% add_xy_position(x = "Month")
obs_bxp <- ggboxplot(df, x = "Month", y = "Observed" ) + 
  stat_pvalue_manual(obs_dunn, label = "p.adj.signif", hide.ns = TRUE)
obs_bxp

sh_dunn <- dunn_test(Shannon ~ Month, data=df, p.adjust.method="BH")
sh_dunn <- sh_dunn %>% arrange(p.adj)
sh_dunn <- sh_dunn %>% add_xy_position(x = "Month")
sh_bxp <- ggboxplot(df, x = "Month", y = "Shannon" ) + 
  stat_pvalue_manual(sh_dunn, label = "p.adj.signif", hide.ns = TRUE)
sh_bxp

inv_dunn <- dunn_test(InvSimpson ~ Month, data=df, p.adjust.method="BH")
inv_dunn <- inv_dunn %>% arrange(p.adj)
inv_dunn <- inv_dunn %>% add_xy_position(x = "Month")
inv_bxp <- ggboxplot(df, x = "Month", y = "InvSimpson" ) + 
  stat_pvalue_manual(inv_dunn, label = "p.adj.signif", hide.ns = TRUE)
inv_bxp

ev_dunn <- dunn_test(Evenness ~ Month, data=df, p.adjust.method="BH")
ev_dunn <- ev_dunn %>% arrange(p.adj)
ev_dunn <- ev_dunn %>% add_xy_position(x = "Month")
ev_bxp <- ggboxplot(df, x = "Month", y = "Evenness" ) + 
  stat_pvalue_manual(ev_dunn, label = "p.adj.signif", hide.ns = TRUE)
ev_bxp

我得到了基本的箱线图,其中添加了重要性星,下面是“观察”的结果:

每个索引都有相同的代码行(Observed,Shannon,InvSimpson,Evenness),所以我想做一个for循环,但我在这方面很新,我真的很挣扎。
你知道我如何在我的4个索引上运行dunn_test()add_xy_position()ggboxplot()的循环吗?最好使用单独的 Dataframe 作为每个索引的输出。
即使只是第一步dunn_test()的循环也会有很大的帮助,因为我不知道从哪里开始...
提前感谢您的任何建议:)

14ifxucb

14ifxucb1#

只需使用fo rmula和data作为函数中的参数,并使用...作为其他参数,如bracket.nudge.y=,以调整愚蠢的n.s空白。

dunn_bxp <- \(fo, data, plot=TRUE, ...) {
  vars <- all.vars(fo)
  dnn <- rstatix::dunn_test(fo, data=data, p.adjust.method="BH")
  dnn_p <- rstatix::add_xy_position(dnn, x=vars[2])
  if (plot) {
    p <- ggpubr::ggboxplot(data, x=vars[2], y=vars[1]) + 
      ggpubr::stat_pvalue_manual(dnn_p, label="p.adj.signif", hide.ns=TRUE, ...)
    print(p)
    return(invisible(as.data.frame(dnn[1:8])))
  } else {
    return(as.data.frame(dnn[1:8]))
  }
}

如果是plot=TRUE(默认值),则绘制并以不可见的方式返回统计信息。

r <- dunn_bxp(fo=observed ~ month, data=df, bracket.nudge.y=-1000)
head(r)
#        .y. group1 group2 n1 n2  statistic           p      p.adj
# 1 observed    Apr    Aug  3  3 -0.6199874 0.535266078 0.71803318
# 2 observed    Apr    Dec  3  3 -2.7124449 0.006678888 0.08816133
# 3 observed    Apr    Feb  3  3 -1.1237272 0.261128784 0.50954093
# 4 observed    Apr    Jan  3  3 -1.7049654 0.088200884 0.32174752
# 5 observed    Apr    Jul  3  3 -1.7049654 0.088200884 0.32174752
# 6 observed    Apr    Jun  3  3  0.7749843 0.438348961 0.64291181

如果是plot=FALSE,则只返回统计信息。

head(dunn_bxp(fo=observed ~ month, data=df, plot=FALSE))
#        .y. group1 group2 n1 n2  statistic           p      p.adj
# 1 observed    Apr    Aug  3  3 -0.6199874 0.535266078 0.71803318
# 2 observed    Apr    Dec  3  3 -2.7124449 0.006678888 0.08816133
# 3 observed    Apr    Feb  3  3 -1.1237272 0.261128784 0.50954093
# 4 observed    Apr    Jan  3  3 -1.7049654 0.088200884 0.32174752
# 5 observed    Apr    Jul  3  3 -1.7049654 0.088200884 0.32174752
# 6 observed    Apr    Jun  3  3  0.7749843 0.438348961 0.64291181

  • 数据:*
set.seed(557)
df <- data.frame(observed=round(rep(runif(12, 500, 800), each=3) + rep.int(rnorm(12, 0, 64), 3)),
                 month=rep(month.abb, each=3))
mefy6pfw

mefy6pfw2#

qdread展示了一个超级流畅的方法。我有一个不同的方法,使用for循环。因为你没有发布一个可重复的例子,我无法测试我的代码,它可能会产生错误,但它应该给予你一个想法。祝你好运!

list <- list() # creates empty list
count<-1       # creates an indicator starting with one, increasing by one every iteration, first plot will be saved as first item of list, second plot as second item, ...

for (i in c("Observed","Shannon","InvSimpson","Evenness")){            # i will take the first value in c(...) in the first iteration
                                                                       # the second value in the second iteration, ...
  list[[count]] <- eval(parse(
    text=paste0('dunn_test(',i,'~ Month, data=df, p.adjust.method="BH")') #paste0 pasts a string, eval(parse(text="string")) executes the string
    ))
  list[[count]] <- list[[count]] %>% arrange(p.adj)
  list[[count]] <- list[[count]] %>% add_xy_position(x = "Month")
  list[[count]] <- eval(parse(text=paste0('ggboxplot(df, x =,',i,', y = paste0(i)) + 
                                stat_pvalue_manual(',list[[count]],', label = "p.adj.signif", hide.ns = TRUE)')))
  count<- count+1 # count indicator increases by 1
}
qhhrdooz

qhhrdooz3#

这是一个以字符串为参数的函数。你可以为每个变量调用这个函数。

dunn_boxplot <- function(yvariable) {
  dunn <- dunn_test(formula(paste(yvariable, "~ Month")), data=df, p.adjust.method="BH") %>%
    arrange(p.adj) %>%
    add_xy_position(x = "Month")
  bxp <- ggboxplot(df, x = "Month", y = yvariable) + 
    stat_pvalue_manual(dunn, label = "p.adj.signif", hide.ns = TRUE)
}

按如下方式调用该函数。对其他三个变量重复此操作。

obs_bxp <- dunn_boxplot("Observed")

需要注意的关键是,您的四个代码块仅在y变量名称上有所不同。因此,它是作为参数传递的。您需要使用formula(paste(...)将公式组装为字符串,然后将其强制转换为formula类。

相关问题