我正在使用R编程语言。
我有以下矩阵:
set.seed(123)
mat <- matrix(ifelse(runif(100) < 0.5, 0, runif(100)), nrow = 10, ncol = 10,
dimnames = list(c('aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff', 'ggg', 'hhh', 'iii', 'jjj'),
c('111', '222', '333', '444', '555', '666', '777', '888', '999', '101010')))
mat <- t(apply(mat, 1, function(x) if(sum(x) > 1) x/sum(x) else x))
111 222 333 444 555 666 777 888 999 101010
aaa 0.0000000 0.28047306 0.19428708 0.1856996 0.0000000 0.00000000 0.15062710 0.18891319 0.00000000 0.0000000
bbb 0.1409196 0.00000000 0.13541406 0.3774219 0.0000000 0.00000000 0.00000000 0.07783414 0.13229252 0.1361178
ccc 0.0000000 0.02648093 0.13420016 0.2935025 0.0000000 0.16917150 0.00000000 0.37664493 0.00000000 0.0000000
ddd 0.2549304 0.25312832 0.05869772 0.1968660 0.0000000 0.00000000 0.00000000 0.00000000 0.07078359 0.1655940
eee 0.3661311 0.00000000 0.28014223 0.0000000 0.0000000 0.08423207 0.26949462 0.00000000 0.00000000 0.0000000
fff 0.0000000 0.12631388 0.87368612 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.0000000
ggg 0.2768805 0.00000000 0.04669054 0.2488325 0.0000000 0.00000000 0.22416403 0.00000000 0.08024861 0.1231838
hhh 0.2727062 0.00000000 0.04078665 0.0000000 0.0000000 0.09716544 0.09905154 0.23736023 0.25292995 0.0000000
iii 0.1882696 0.00000000 0.00000000 0.0000000 0.0000000 0.20389182 0.18921225 0.00000000 0.41862635 0.0000000
jjj 0.0000000 0.23662317 0.00000000 0.0000000 0.4282713 0.00000000 0.00000000 0.00000000 0.00000000 0.3351055
我也有这个“传说”:
set.seed(123)
col_names <- c('111', '222', '333', '444', '555', '666', '777', '888', '999', '101010')
colors <- sample(c('red', 'green', 'blue'), 10, replace = TRUE)
color_df <- data.frame(col_names, colors)
col_names colors
1 111 blue
2 222 blue
3 333 blue
4 444 green
5 555 blue
6 666 green
7 777 green
8 888 green
9 999 blue
10 101010 red
**我的问题:**我试图找到矩阵中每一行属于任何给定颜色的百分比。
最后的输出应该像这样(第一行):
id blue green red
1 aaa 0.4747601 0.5252399 0
我试着用下面的代码来做这件事:
# Match colors with matrix columns
col_colors <- color_df$colors[match(colnames(mat), color_df$col_names)]
# Calculate percentage for each color
color_perc <- t(apply(mat, 1, function(x) {
c(
blue = sum(x[col_colors == "blue"]) * 100,
green = sum(x[col_colors == "green"]) * 100,
red = sum(x[col_colors == "red"]) * 100
)
}))
# Combine with row names
final <- data.frame(id = rownames(mat), color_perc)
结果如下所示:
id blue green red
aaa aaa 47.47601 52.52399 0.00000
bbb bbb 40.86262 45.52560 13.61178
ccc ccc 16.06811 83.93189 0.00000
ddd ddd 63.75400 19.68660 16.55940
eee eee 64.62733 35.37267 0.00000
fff fff 100.00000 0.00000 0.00000
ggg ggg 40.38197 47.29965 12.31838
hhh hhh 56.64228 43.35772 0.00000
iii iii 60.68959 39.31041 0.00000
jjj jjj 66.48945 0.00000 33.51055
有人可以告诉我,如果我这样做是正确的吗?
谢谢!
1条答案
按热度按时间sqougxex1#
OP的方法是正确的,尽管我们可以使用
split
和rowSums
来更有效地完成这一点,而不是仅仅在每行上循环,即split
通过'colors'来循环'col_names',然后用sapply
循环list
,提取mat
列,得到rowSums
并乘以100和round