R语言 如何量化具有相似模式的地块之间的相似性?

9nvpjoqh  于 2023-09-27  发布在  其他
关注(0)|答案(2)|浏览(90)

我比较了这两个图的值,正如你所看到的,虽然它们的值不同,但它们有相似的模式。换句话说,每组中每个基因的排序是重要的。如何量化这种相似性?

df

> dput(df)
structure(list(Gene = c("Gene 1", "Gene 2", "Gene 3", "Gene 4", 
"Gene 5", "Gene 6", "Gene 7", "Gene 8", "Gene 9", "Gene 10", 
"Gene 11", "Gene 12", "Gene 13", "Gene 14", "Gene 15", "Gene 16", 
"Gene 1", "Gene 2", "Gene 3", "Gene 4", "Gene 5", "Gene 6", "Gene 7", 
"Gene 8", "Gene 9", "Gene 10", "Gene 11", "Gene 12", "Gene 13", 
"Gene 14", "Gene 15", "Gene 16"), Percent = c(2.6, 15.1, 2.3, 
2.3, 3, 2.1, 3.6, 3.8, 9.2, 3.7, 7.2, 1.8, 3.2, 4.1, 7.2, 2.6, 
1.4, 8.1, 1.4, 1.3, 1.7, 1.5, 3, 2.3, 4.6, 2.2, 3.6, 1.1, 1.5, 
2, 2.5, 1), Study = c("PCAWG", "PCAWG", "PCAWG", "PCAWG", "PCAWG", 
"PCAWG", "PCAWG", "PCAWG", "PCAWG", "PCAWG", "PCAWG", "PCAWG", 
"PCAWG", "PCAWG", "PCAWG", "PCAWG", "TCGA", "TCGA", "TCGA", "TCGA", 
"TCGA", "TCGA", "TCGA", "TCGA", "TCGA", "TCGA", "TCGA", "TCGA", 
"TCGA", "TCGA", "TCGA", "TCGA")), class = "data.frame", row.names = c(NA, 
-32L))
5t7ly7z5

5t7ly7z51#

你可以比较基因rank s。

df[with(df, order(Study, Gene)), ] |>
  with(by(Percent, Study, rank)) |> simplify2array()
#       PCAWG TCGA
#  [1,]   5.5  4.5
#  [2,]  10.0 10.0
#  [3,]  13.5 14.0
#  [4,]   1.0  2.0
#  [5,]   8.0  6.5
#  [6,]  12.0  9.0
#  [7,]  13.5 12.0
#  [8,]   5.5  1.0
#  [9,]  16.0 16.0
# [10,]   3.5  4.5
# [11,]   3.5  3.0
# [12,]   7.0  8.0
# [13,]   2.0  6.5
# [14,]   9.0 13.0
# [15,]  11.0 11.0
# [16,]  15.0 15.0

例如,通过平方行差,并采取mean,这将给予你一个像均方误差的东西。

df[with(df, order(Study, Gene)), ] |>
  with(by(Percent, Study, rank)) |> simplify2array() |> 
  matrixStats::rowDiffs() |> base::`^`(2) |> mean()
# [1] 4.65625
ddhy6vgd

ddhy6vgd2#

一个显而易见的选择是使用cor.test来检查匹配基因百分比之间的相关性

cor.test(df$Percent[df$Study == "PCAWG"], df$Percent[df$Study == "TCGA"])
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  df$Percent[df$Study == "PCAWG"] and df$Percent[df$Study == "TCGA"]
#> t = 13.648, df = 14, p-value = 1.764e-09
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.8980195 0.9878583
#> sample estimates:
#>       cor 
#> 0.9644133

这显示了96.4%的相关性,这确实是非常强的相关性。

相关问题