R:执行 Dataframe 和矩阵之间的匹配

2g32fytz  于 2023-04-09  发布在  其他
关注(0)|答案(1)|浏览(120)

我正在使用R编程语言。
我有以下矩阵:

set.seed(123)
mat <- matrix(ifelse(runif(100) < 0.5, 0, runif(100)), nrow = 10, ncol = 10,
              dimnames = list(c('aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff', 'ggg', 'hhh', 'iii', 'jjj'),
                              c('111', '222', '333', '444', '555', '666', '777', '888', '999', '101010')))

mat <- t(apply(mat, 1, function(x) if(sum(x) > 1) x/sum(x) else x))

          111        222        333       444       555        666        777        888        999    101010
aaa 0.0000000 0.28047306 0.19428708 0.1856996 0.0000000 0.00000000 0.15062710 0.18891319 0.00000000 0.0000000
bbb 0.1409196 0.00000000 0.13541406 0.3774219 0.0000000 0.00000000 0.00000000 0.07783414 0.13229252 0.1361178
ccc 0.0000000 0.02648093 0.13420016 0.2935025 0.0000000 0.16917150 0.00000000 0.37664493 0.00000000 0.0000000
ddd 0.2549304 0.25312832 0.05869772 0.1968660 0.0000000 0.00000000 0.00000000 0.00000000 0.07078359 0.1655940
eee 0.3661311 0.00000000 0.28014223 0.0000000 0.0000000 0.08423207 0.26949462 0.00000000 0.00000000 0.0000000
fff 0.0000000 0.12631388 0.87368612 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.0000000
ggg 0.2768805 0.00000000 0.04669054 0.2488325 0.0000000 0.00000000 0.22416403 0.00000000 0.08024861 0.1231838
hhh 0.2727062 0.00000000 0.04078665 0.0000000 0.0000000 0.09716544 0.09905154 0.23736023 0.25292995 0.0000000
iii 0.1882696 0.00000000 0.00000000 0.0000000 0.0000000 0.20389182 0.18921225 0.00000000 0.41862635 0.0000000
jjj 0.0000000 0.23662317 0.00000000 0.0000000 0.4282713 0.00000000 0.00000000 0.00000000 0.00000000 0.3351055

我也有这个“传说”:

set.seed(123)
col_names <- c('111', '222', '333', '444', '555', '666', '777', '888', '999', '101010')
colors <- sample(c('red', 'green', 'blue'), 10, replace = TRUE)
color_df <- data.frame(col_names, colors)

   col_names colors
1        111   blue
2        222   blue
3        333   blue
4        444  green
5        555   blue
6        666  green
7        777  green
8        888  green
9        999   blue
10    101010    red

**我的问题:**我试图找到矩阵中每一行属于任何给定颜色的百分比。

最后的输出应该像这样(第一行):

id      blue     green red
1 aaa 0.4747601 0.5252399   0

我试着用下面的代码来做这件事:

# Match colors with matrix columns
col_colors <- color_df$colors[match(colnames(mat), color_df$col_names)]

# Calculate percentage for each color
color_perc <- t(apply(mat, 1, function(x) {
  c(
    blue = sum(x[col_colors == "blue"]) * 100,
    green = sum(x[col_colors == "green"]) * 100,
    red = sum(x[col_colors == "red"]) * 100
  )
}))

# Combine with row names
final <- data.frame(id = rownames(mat), color_perc)

结果如下所示:

id      blue    green      red
aaa aaa  47.47601 52.52399  0.00000
bbb bbb  40.86262 45.52560 13.61178
ccc ccc  16.06811 83.93189  0.00000
ddd ddd  63.75400 19.68660 16.55940
eee eee  64.62733 35.37267  0.00000
fff fff 100.00000  0.00000  0.00000
ggg ggg  40.38197 47.29965 12.31838
hhh hhh  56.64228 43.35772  0.00000
iii iii  60.68959 39.31041  0.00000
jjj jjj  66.48945  0.00000 33.51055

有人可以告诉我,如果我这样做是正确的吗?

谢谢!

sqougxex

sqougxex1#

OP的方法是正确的,尽管我们可以使用splitrowSums来更有效地完成这一点,而不是仅仅在每行上循环,即split通过'colors'来循环'col_names',然后用sapply循环list,提取mat列,得到rowSums并乘以100和round

round(100 * sapply(with(color_df, split(col_names, colors)),
     \(nm) rowSums(mat[, nm, drop = FALSE])), 3)
  • 输出
blue  green    red
aaa  47.476 52.524  0.000
bbb  40.863 45.526 13.612
ccc  16.068 83.932  0.000
ddd  63.754 19.687 16.559
eee  64.627 35.373  0.000
fff 100.000  0.000  0.000
ggg  40.382 47.300 12.318
hhh  56.642 43.358  0.000
iii  60.690 39.310  0.000
jjj  66.489  0.000 33.511

相关问题