我有一个dummy df及其相关的图元文件meta_df
df <- data.frame(Gene=c("gene1","gene2","gene3"),
Co_Mark_A_Treat= c(10,11,12),
Co_Mark_B_Treat= c(10,11,12),
Co_Mark_C_Treat= c(10,11,12),
Co_Mark_Ctr= c(15,16,17),
Co_Paul_A_Treat= c(10,11,12),
Co_Paul_B_Treat= c(10,11,12),
Co_Paul_C_Treat= c(10,11,12),
Co_Paul_Ctr= c(15,16,17))
meta_df <- data.frame(Sample=c("Mark_A_Treat","Mark_B_Treat", "Mark_C_Treat", "Mark_Ctr"
,"Paul_A_Treat","Paul_B_Treat","Paul_C_Treat", "Paul_Ctr"),
Name=c("Mark","Mark", "Mark", "Mark","Paul","Paul","Paul","Paul"))
row.names(meta_df) <- paste0(c("Co_Mark_A_Treat","Co_Mark_B_Treat", "Co_Mark_C_Treat", "Co_Mark_Ctr",
"Co_Paul_A_Treat","Co_Paul_B_Treat", "Co_Paul_C_Treat","Co_Paul_Ctr"))
我想根据名称在ctr和treat组之间进行匹配比较。为此,我首先从 meta_df中检索了两个ctr和treat列表:
groups <- unique(meta_df$Sample)
groups_treat <-groups[grep("Treat",groups)]
groups_ctr <- groups[grep("Ctr",groups)]
然后运行for循环来获得比较结果:
for (j in 1:length(groups_treat)) {
for (i in 1:length(groups_ctr)){
group1 <- groups_ctr[i]
cond1 <- rownames(meta_df[grep(group1,meta_df$Sample),])
group2 <- groups_treat[j]
cond2 <- rownames(meta_df[grep(group2,meta_df$Sample),])
print(paste("processing ", group1 , " versus ", group2))
}}
这很有效,但我得到了所有可能的比较,也是我不想要的(马克vs保罗):
[1] "processing Mark_Ctr versus Mark_A_Treat"
[1] "processing Paul_Ctr versus Mark_A_Treat"
[1] "processing Mark_Ctr versus Mark_B_Treat"
[1] "processing Paul_Ctr versus Mark_B_Treat"
[1] "processing Mark_Ctr versus Mark_C_Treat"
[1] "processing Paul_Ctr versus Mark_C_Treat"
[1] "processing Mark_Ctr versus Paul_A_Treat"
[1] "processing Paul_Ctr versus Paul_A_Treat"
[1] "processing Mark_Ctr versus Paul_B_Treat"
[1] "processing Paul_Ctr versus Paul_B_Treat"
[1] "processing Mark_Ctr versus Paul_C_Treat"
[1] "processing Paul_Ctr versus Paul_C_Treat"
如何操作代码以避免比较Mark和Paul样本,而只比较Mark ctr和treat以及Paul ctr和treat?
以下是它应该是什么样的结果:
[1] "processing Mark_Ctr versus Mark_A_Treat"
[1] "processing Mark_Ctr versus Mark_B_Treat"
[1] "processing Mark_Ctr versus Mark_C_Treat"
[1] "processing Paul_Ctr versus Paul_A_Treat"
[1] "processing Paul_Ctr versus Paul_B_Treat"
[1] "processing Paul_Ctr versus Paul_C_Treat"
再次感谢你的帮助。
2条答案
按热度按时间pgvzfuti1#
在R中,你可以使用
expand.grid
函数来简化这种组合的创建。第一步
首先让我们使用
expand.grid
定义一个函数,根据模式对任何列进行这些组合(默认参数为:“对照”与“治疗”):看看我是如何使用你的代码元素的,比如
grep
选择和paste
函数。第二步
接下来,我们可以通过将基于
Name
列“Mark”和“Paul”的 meta_data拆分成一个包含两个子集的列表alist
来简化大部分代码:第三步:
我们现在可以迭代
alist
并使用第1步中的函数:注意:我最后使用了
unlist
,因为lapply
返回了一个列表。根据你想做的事情,你也可以选择使用sapply
而不是lapply
,或者使用stack而不是unlist
来获取 Dataframe 而不是列表。pgvzfuti2#
你可以
split
你的 meta帧在名称和子列表lapply
和mapply
与message
s和计算。希望你不介意我通过将Vany添加到玩具矩阵
counts
来扩展你的df
。使用基因作为行名称而不是列名称,因为它是字符类型,不同于数字类型的计数。然后,您可以使用数值矩阵格式,这是更内存效率和更快。
要在这样的矩阵中轻松地将
df
从OP转换为OP,请执行以下操作