对于每个未重复的gs_name
,找到相应的gene_symbol
。根据rownames(all.deg)
和gene_symbol
之间的匹配,对all.deg
的行进行子集化,并保存为单个矩阵(对于每个gs_name
)。
尝试一:
for (i in unique(kegg$gs_name)) {
for (j in kegg$gene_symbol) {
mat <- as.matrix(all.deg[rownames(all.deg) %in% j,])
}
}
字符串
尝试二:
all.deg <- tibble::rownames_to_column(as.data.frame(all.deg), "gene")
inner_join(kegg, as.data.frame(all.deg), by = c("gs_name", "gene"="gene_symbol"))
型
回溯:
Error in `inner_join()`:
! Join columns in `x` must be present in the data.
✖ Problem with `gene`.
Run `rlang::last_trace()` to see where the error occurred.
型
输入:
> all.deg <- structure(c(16.0169585624867, 14.3983080662428, 12.7844219145156,
12.6674945373237, 13.8584047354367, 13.563719599839, 13.6166993468069,
12.9748157402651, 12.7386065050292, 12.2201616898331, 11.3657998135948,
11.8253392160132, 12.1132082166185, 11.5123143882139, 10.2967924742924,
13.7513874043739, 13.2403954818698, 12.4196432226432, 12.4676109090624,
12.1390647972695, 12.3013113392588, 12.4867673484914, 11.3693921877853,
10.6359730348998, 10.0122721528039), dim = c(5L, 5L), dimnames = list(
c("FTL", "MIGA2", "HLA.A", "THBD", "CD74"), c("TCGA.2K.A9WE.01",
"TCGA.2Z.A9J1.01", "TCGA.2Z.A9J3.01", "TCGA.2Z.A9J6.01",
"TCGA.2Z.A9J7.01")))
> kegg <- structure(list(gs_cat = c("C2", "C2", "C2", "C2"), gs_subcat = c("CP:KEGG",
"CP:KEGG", "CP:KEGG", "CP:KEGG"), gs_name = c("adipocytokine_signaling_pathway",
"adipocytokine_signaling_pathway", "alanine_aspartate_and_glutamate_metabolism",
"alanine_aspartate_and_glutamate_metabolism"), gene_symbol = c("ACACB",
"ACSL1", "CPS1", "DDO"), entrez_gene = c(32L, 2180L, 1373L, 8528L
), ensembl_gene = c("ENSG00000076555", "ENSG00000151726", "ENSG00000021826",
"ENSG00000203797"), human_gene_symbol = c("ACACB", "FTL", "CPS1",
"DDO"), human_entrez_gene = c(32L, 2180L, 1373L, 8528L), human_ensembl_gene = c("ENSG00000076555",
"ENSG00000151726", "ENSG00000021826", "ENSG00000203797"), gs_id = c("M10462",
"M10462", "M17758", "M17758"), gs_pmid = c("", "", "", ""), gs_geoid = c("",
"", "", ""), gs_exact_source = c("hsa04920", "hsa04920", "hsa00250",
"hsa00250"), gs_url = c("http://www.genome.jp/kegg/pathway/hsa/hsa04920.html",
"http://www.genome.jp/kegg/pathway/hsa/hsa04920.html", "http://www.genome.jp/kegg/pathway/hsa/hsa00250.html",
"http://www.genome.jp/kegg/pathway/hsa/hsa00250.html"), gs_description = c("Adipocytokine signaling pathway",
"Adipocytokine signaling pathway", "Alanine, aspartate and glutamate metabolism",
"Alanine, aspartate and glutamate metabolism")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
型
预期产量:
structure(c(16.0169585624867, 13.563719599839, 11.3657998135948,
13.7513874043739, 12.3013113392588), dim = c(1L, 5L), dimnames = list(
"FTL", c("TCGA.2K.A9WE.01", "TCGA.2Z.A9J1.01", "TCGA.2Z.A9J3.01",
"TCGA.2Z.A9J6.01", "TCGA.2Z.A9J7.01")))
型
将所有df
存储为 Dataframe 列表。
相关问题:Iterate over each row to obtain matches between row values and the rownames of another dataframe df2, then subset df2
1条答案
按热度按时间bgibtngc1#
两个选择:
信贷:弗里克先生
字符串
另一种
tidyverse
-y方式型