如何根据与另一个 Dataframe 的匹配来子集化一个 Dataframe ?

k10s72fa  于 2023-07-31  发布在  其他
关注(0)|答案(1)|浏览(107)

对于每个未重复的gs_name,找到相应的gene_symbol。根据rownames(all.deg)gene_symbol之间的匹配,对all.deg的行进行子集化,并保存为单个矩阵(对于每个gs_name)。
尝试一:

for (i in unique(kegg$gs_name)) {              
  for (j in kegg$gene_symbol) {                 
    mat <- as.matrix(all.deg[rownames(all.deg) %in% j,])     
  }
}

字符串
尝试二:

all.deg <- tibble::rownames_to_column(as.data.frame(all.deg), "gene")
inner_join(kegg, as.data.frame(all.deg), by = c("gs_name", "gene"="gene_symbol"))


回溯:

Error in `inner_join()`:
! Join columns in `x` must be present in the data.
✖ Problem with `gene`.
Run `rlang::last_trace()` to see where the error occurred.


输入:

> all.deg <- structure(c(16.0169585624867, 14.3983080662428, 12.7844219145156, 
12.6674945373237, 13.8584047354367, 13.563719599839, 13.6166993468069, 
12.9748157402651, 12.7386065050292, 12.2201616898331, 11.3657998135948, 
11.8253392160132, 12.1132082166185, 11.5123143882139, 10.2967924742924, 
13.7513874043739, 13.2403954818698, 12.4196432226432, 12.4676109090624, 
12.1390647972695, 12.3013113392588, 12.4867673484914, 11.3693921877853, 
10.6359730348998, 10.0122721528039), dim = c(5L, 5L), dimnames = list(
    c("FTL", "MIGA2", "HLA.A", "THBD", "CD74"), c("TCGA.2K.A9WE.01", 
    "TCGA.2Z.A9J1.01", "TCGA.2Z.A9J3.01", "TCGA.2Z.A9J6.01", 
    "TCGA.2Z.A9J7.01")))

> kegg <- structure(list(gs_cat = c("C2", "C2", "C2", "C2"), gs_subcat = c("CP:KEGG", 
"CP:KEGG", "CP:KEGG", "CP:KEGG"), gs_name = c("adipocytokine_signaling_pathway", 
"adipocytokine_signaling_pathway", "alanine_aspartate_and_glutamate_metabolism", 
"alanine_aspartate_and_glutamate_metabolism"), gene_symbol = c("ACACB", 
"ACSL1", "CPS1", "DDO"), entrez_gene = c(32L, 2180L, 1373L, 8528L
), ensembl_gene = c("ENSG00000076555", "ENSG00000151726", "ENSG00000021826", 
"ENSG00000203797"), human_gene_symbol = c("ACACB", "FTL", "CPS1", 
"DDO"), human_entrez_gene = c(32L, 2180L, 1373L, 8528L), human_ensembl_gene = c("ENSG00000076555", 
"ENSG00000151726", "ENSG00000021826", "ENSG00000203797"), gs_id = c("M10462", 
"M10462", "M17758", "M17758"), gs_pmid = c("", "", "", ""), gs_geoid = c("", 
"", "", ""), gs_exact_source = c("hsa04920", "hsa04920", "hsa00250", 
"hsa00250"), gs_url = c("http://www.genome.jp/kegg/pathway/hsa/hsa04920.html", 
"http://www.genome.jp/kegg/pathway/hsa/hsa04920.html", "http://www.genome.jp/kegg/pathway/hsa/hsa00250.html", 
"http://www.genome.jp/kegg/pathway/hsa/hsa00250.html"), gs_description = c("Adipocytokine signaling pathway", 
"Adipocytokine signaling pathway", "Alanine, aspartate and glutamate metabolism", 
"Alanine, aspartate and glutamate metabolism")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))


预期产量:

structure(c(16.0169585624867, 13.563719599839, 11.3657998135948, 
13.7513874043739, 12.3013113392588), dim = c(1L, 5L), dimnames = list(
    "FTL", c("TCGA.2K.A9WE.01", "TCGA.2Z.A9J1.01", "TCGA.2Z.A9J3.01", 
    "TCGA.2Z.A9J6.01", "TCGA.2Z.A9J7.01")))


将所有df存储为 Dataframe 列表。
相关问题:Iterate over each row to obtain matches between row values and the rownames of another dataframe df2, then subset df2

bgibtngc

bgibtngc1#

两个选择:
信贷:弗里克先生

all.deg[rownames(all.deg) %in% kegg$human_gene_symbol,,drop=FALSE] #

字符串
另一种tidyverse-y方式

filter(as.data.frame(all.deg), rownames(all.deg) %in% kegg$human_gene_symbol)

相关问题