如何将包含重复行的 Dataframe 重新整形为行名称和列名称

ncgqoxb0  于 2023-02-10  发布在  其他
关注(0)|答案(4)|浏览(189)

我一直在努力重塑以下 Dataframe :

geneSymbol <- c(rep("gene1",4),rep("gene2",4),rep("gene3",4))
Sample_name <- rep(c("sample1","sample2","sample3","sample4"),3)
log2FC <- c(1.5,-1.0,0.5,0.2,-0.3,-0.7,-0.12,0.33,0.64,-0.17,2.3,-1.7)
df <- data.frame(geneSymbol, Sample_name, log2FC)
> df
   geneSymbol Sample_name log2FC
1       gene1     sample1   1.50
2       gene1     sample2  -1.00
3       gene1     sample3   0.50
4       gene1     sample4   0.20
5       gene2     sample1  -0.30
6       gene2     sample2  -0.70
7       gene2     sample3  -0.12
8       gene2     sample4   0.33
9       gene3     sample1   0.64
10      gene3     sample2  -0.17
11      gene3     sample3   2.30
12      gene3     sample4  -1.70

其中,"geneSymbol"和"Sample_name"列各有重复的行。我一直在尝试将此 Dataframe 重新调整为以"geneSymbol"作为其行名称、以"Sample_name"作为其列名称的 Dataframe ,其外观应如下所示:

sample1  sample2  sample3  sample4
gene1    1.50    -1.00     0.50     0.20
gene2   -0.30    -0.70    -0.12     0.33
gene3    0.64    -0.17     2.30    -1.70

我自己手动创建了这个表,但是我不知道我需要使用哪个函数来从df中创建这个 Dataframe 或表,因为我的 Dataframe 中有数百行代码。如果有人能帮我解决这个问题,我将非常感激。
最美好的祝愿TJ

gjmwrych

gjmwrych1#

xtabs(log2FC ~ geneSymbol + Sample_name, df)

          Sample_name
geneSymbol sample1 sample2 sample3 sample4
     gene1    1.50   -1.00    0.50    0.20
     gene2   -0.30   -0.70   -0.12    0.33
     gene3    0.64   -0.17    2.30   -1.70
fnx2tebb

fnx2tebb2#

使用acast

library(reshape2)
acast(df, geneSymbol ~ Sample_name, value.var = 'log2FC')
      sample1 sample2 sample3 sample4
gene1    1.50   -1.00    0.50    0.20
gene2   -0.30   -0.70   -0.12    0.33
gene3    0.64   -0.17    2.30   -1.70
w3nuxt5m

w3nuxt5m3#

使用tidyr

tidyr::pivot_wider(df,values_from =  'log2FC',names_from = 'Sample_name')

  geneSymbol sample1 sample2 sample3 sample4
  gene1         1.5    -1       0.5     0.2 
  gene2        -0.3    -0.7    -0.12    0.33
  gene3         0.64   -0.17    2.3    -1.7
plicqrtu

plicqrtu4#

以下是使用dcastdata.table吊坠:

library(data.table)

setDT(df)
dcast(df, geneSymbol ~ Sample_name, value.var = "log2FC")
geneSymbol sample1 sample2 sample3 sample4
1:      gene1    1.50   -1.00    0.50    0.20
2:      gene2   -0.30   -0.70   -0.12    0.33
3:      gene3    0.64   -0.17    2.30   -1.70

相关问题