使用for循环的对照和经处理的匹配比较

au9on6nz  于 2023-06-19  发布在  其他
关注(0)|答案(2)|浏览(77)

我有一个dummy df及其相关的图元文件meta_df

df <- data.frame(Gene=c("gene1","gene2","gene3"),
                 Co_Mark_A_Treat= c(10,11,12),
                 Co_Mark_B_Treat= c(10,11,12),
                 Co_Mark_C_Treat= c(10,11,12),
                 Co_Mark_Ctr= c(15,16,17),
                 Co_Paul_A_Treat= c(10,11,12),
                 Co_Paul_B_Treat= c(10,11,12),
                 Co_Paul_C_Treat= c(10,11,12),
                 Co_Paul_Ctr= c(15,16,17))

meta_df <- data.frame(Sample=c("Mark_A_Treat","Mark_B_Treat", "Mark_C_Treat", "Mark_Ctr"
                               ,"Paul_A_Treat","Paul_B_Treat","Paul_C_Treat", "Paul_Ctr"),
                      Name=c("Mark","Mark", "Mark", "Mark","Paul","Paul","Paul","Paul"))

row.names(meta_df) <- paste0(c("Co_Mark_A_Treat","Co_Mark_B_Treat", "Co_Mark_C_Treat", "Co_Mark_Ctr",
                               "Co_Paul_A_Treat","Co_Paul_B_Treat", "Co_Paul_C_Treat","Co_Paul_Ctr"))

我想根据名称在ctr和treat组之间进行匹配比较。为此,我首先从 meta_df中检索了两个ctr和treat列表:

groups <- unique(meta_df$Sample)
  
groups_treat <-groups[grep("Treat",groups)]

groups_ctr <- groups[grep("Ctr",groups)]

然后运行for循环来获得比较结果:

for (j in 1:length(groups_treat)) { 
  
  for (i in 1:length(groups_ctr)){
    
    
    group1 <- groups_ctr[i]
    
    cond1 <- rownames(meta_df[grep(group1,meta_df$Sample),])
    
    
    group2 <- groups_treat[j]
    
    cond2 <- rownames(meta_df[grep(group2,meta_df$Sample),])
    
    print(paste("processing ", group1 , " versus ", group2))
    
    }}

这很有效,但我得到了所有可能的比较,也是我不想要的(马克vs保罗):

[1] "processing  Mark_Ctr  versus  Mark_A_Treat"
[1] "processing  Paul_Ctr  versus  Mark_A_Treat"
[1] "processing  Mark_Ctr  versus  Mark_B_Treat"
[1] "processing  Paul_Ctr  versus  Mark_B_Treat"
[1] "processing  Mark_Ctr  versus  Mark_C_Treat"
[1] "processing  Paul_Ctr  versus  Mark_C_Treat"
[1] "processing  Mark_Ctr  versus  Paul_A_Treat"
[1] "processing  Paul_Ctr  versus  Paul_A_Treat"
[1] "processing  Mark_Ctr  versus  Paul_B_Treat"
[1] "processing  Paul_Ctr  versus  Paul_B_Treat"
[1] "processing  Mark_Ctr  versus  Paul_C_Treat"
[1] "processing  Paul_Ctr  versus  Paul_C_Treat"

如何操作代码以避免比较Mark和Paul样本,而只比较Mark ctr和treat以及Paul ctr和treat?
以下是它应该是什么样的结果:

[1] "processing  Mark_Ctr  versus  Mark_A_Treat"
[1] "processing  Mark_Ctr  versus  Mark_B_Treat"
[1] "processing  Mark_Ctr  versus  Mark_C_Treat"
[1] "processing  Paul_Ctr  versus  Paul_A_Treat"
[1] "processing  Paul_Ctr  versus  Paul_B_Treat"
[1] "processing  Paul_Ctr  versus  Paul_C_Treat"

再次感谢你的帮助。

pgvzfuti

pgvzfuti1#

在R中,你可以使用expand.grid函数来简化这种组合的创建。

第一步

首先让我们使用expand.grid定义一个函数,根据模式对任何列进行这些组合(默认参数为:“对照”与“治疗”):

make_comps <- function(col, pattern = c("Ctr","Treat")) {
 comps<- expand.grid(
    col[grep(pattern[1],col)],
    col[grep(pattern[2],col)])
 paste("processing ", comps$Var1 , " versus ", comps$Var2)
}

看看我是如何使用你的代码元素的,比如grep选择和paste函数。

第二步

接下来,我们可以通过将基于Name列“Mark”和“Paul”的 meta_data拆分成一个包含两个子集的列表alist来简化大部分代码:

alist <- split(meta_df,meta_df$Name)

第三步:

我们现在可以迭代alist并使用第1步中的函数:

lapply(alist, function(p) make_comps(p$Sample)) |>unlist(use.names = F)

[1] "processing  Mark_Ctr  versus  Mark_A_Treat"
[2] "processing  Mark_Ctr  versus  Mark_B_Treat"
[3] "processing  Mark_Ctr  versus  Mark_C_Treat"
[4] "processing  Paul_Ctr  versus  Paul_A_Treat"
[5] "processing  Paul_Ctr  versus  Paul_B_Treat"
[6] "processing  Paul_Ctr  versus  Paul_C_Treat"

注意:我最后使用了unlist,因为lapply返回了一个列表。根据你想做的事情,你也可以选择使用sapply而不是lapply,或者使用stack而不是unlist来获取 Dataframe 而不是列表。

pgvzfuti

pgvzfuti2#

你可以split你的 meta帧在名称和子列表lapplymapplymessage s和计算。
希望你不介意我通过将Vany添加到玩具矩阵counts来扩展你的df

FUN <- \(data, meta) {
  split(meta, meta$Name) |>
    lapply(\(x) {
      tr <- grep('Treat', rownames(x), value=TRUE)
      ct <- grep('Ctr', rownames(x), value=TRUE)
      ot <- grep('Treat', x$Sample, value=TRUE)
      oc <- grep('Ctr', x$Sample, value=TRUE)
      mapply(\(tr, ot, oc) {
        message(sprintf('processing %s vs %s', oc, ot))
        ## any calculations whatsoever, e.g.:
        mean(data[, ct] - data[, tr])
      }, tr, ot, oc)
    })
}

res <- FUN(counts, meta_df)
# processing Mark_Ctr vs Mark_A_Treat
# processing Mark_Ctr vs Mark_B_Treat
# processing Mark_Ctr vs Mark_C_Treat
# processing Paul_Ctr vs Paul_A_Treat
# processing Paul_Ctr vs Paul_B_Treat
# processing Paul_Ctr vs Paul_C_Treat
# processing Vany_Ctr vs Vany_A_Treat
# processing Vany_Ctr vs Vany_B_Treat
# processing Vany_Ctr vs Vany_C_Treat
    
res
# $Mark
# Co_Mark_A_Treat Co_Mark_B_Treat Co_Mark_C_Treat 
#     4.666667     6.666667     1.000000 
# 
# $Paul
# Co_Paul_A_Treat Co_Paul_B_Treat Co_Paul_C_Treat 
#     -9.3333333   -0.6666667  -10.0000000 
# 
# $Vany
# Co_Vany_A_Treat Co_Vany_B_Treat Co_Vany_C_Treat 
#     6.000000    -5.333333     1.666667

使用基因作为行名称而不是列名称,因为它是字符类型,不同于数字类型的计数。然后,您可以使用数值矩阵格式,这是更内存效率和更快。
要在这样的矩阵中轻松地将df从OP转换为OP,请执行以下操作

counts1 <- as.matrix(`rownames<-`(df[-1], df[, 1]))
  • 数据:*
counts <- structure(c(113L, 94L, 100L, 88L, 98L, 115L, 99L, 120L, 99L, 
113L, 122L, 86L, 113L, 109L, 108L, 104L, 109L, 91L, 96L, 119L, 
117L, 103L, 103L, 96L, 82L, 90L, 104L, 107L, 110L, 93L, 92L, 
92L, 105L, 96L, 102L, 96L), dim = c(3L, 12L), dimnames = list(
    c("Gene_1", "Gene_2", "Gene_3"), c("Co_Mark_A_Treat", "Co_Mark_B_Treat", 
    "Co_Mark_C_Treat", "Co_Mark_Ctr", "Co_Paul_A_Treat", "Co_Paul_B_Treat", 
    "Co_Paul_C_Treat", "Co_Paul_Ctr", "Co_Vany_A_Treat", "Co_Vany_B_Treat", 
    "Co_Vany_C_Treat", "Co_Vany_Ctr")))

meta_df <- structure(list(Sample = c("Mark_A_Treat", "Mark_B_Treat", "Mark_C_Treat", 
"Mark_Ctr", "Paul_A_Treat", "Paul_B_Treat", "Paul_C_Treat", "Paul_Ctr", 
"Vany_A_Treat", "Vany_B_Treat", "Vany_C_Treat", "Vany_Ctr"), 
    Name = c("Mark", "Mark", "Mark", "Mark", "Paul", "Paul", 
    "Paul", "Paul", "Vany", "Vany", "Vany", "Vany")), class = "data.frame", row.names = c("Co_Mark_A_Treat", 
"Co_Mark_B_Treat", "Co_Mark_C_Treat", "Co_Mark_Ctr", "Co_Paul_A_Treat", 
"Co_Paul_B_Treat", "Co_Paul_C_Treat", "Co_Paul_Ctr", "Co_Vany_A_Treat", 
"Co_Vany_B_Treat", "Co_Vany_C_Treat", "Co_Vany_Ctr"))

相关问题