我想根据样本100%的比例来确定值。我的df包含col Project
,我想将col n
分组,以将n
的值标准化为100%。
library("dplyr")
ex %>%
dplyr::group_by(Project) %>%
scale(n) -> perc
对于我提供的数据,我希望输出perc
为
| 项目|React|规范|
| --|--|--|
| Ga0598239|砷酸盐还原|0.0312|
| Ga0598239|碳固定|零点零零三|
等等。
> dput(ex)
structure(list(Project = c("Ga0598239", "Ga0598239", "Ga0598239",
"Ga0598239", "Ga0598239", "Ga0598239", "Ga0598239", "Ga0598239",
"Ga0598239", "Ga0598239", "Ga0598239", "Ga0598239", "Ga0598239",
"Ga0598239", "Ga0598240", "Ga0598240", "Ga0598240", "Ga0598240",
"Ga0598240", "Ga0598240"), reaction = c("arsenate-reduction",
"carbon-fixation", "formaldehyde-oxidation", "halogenated-compounds-breakdown",
"hydrogen-oxidation", "iron-oxidation", "iron-reduction", "manganese-oxidation",
"methanol-oxidation", "selenate-reduction", "sulfide-oxidation",
"sulfite-reduction", "sulfur-oxidation", "thiosulfate-disproportionation",
"arsenate-reduction", "carbon-fixation", "formaldehyde-oxidation",
"halogenated-compounds-breakdown", "hydrogen-oxidation", "iron-oxidation"
), n = c(103L, 11L, 157L, 90L, 2296L, 85L, 33L, 156L, 17L, 38L,
8L, 9L, 259L, 13L, 90L, 21L, 202L, 81L, 2090L, 73L)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
Project = c("Ga0598239", "Ga0598239", "Ga0598239", "Ga0598239",
"Ga0598239", "Ga0598239", "Ga0598239", "Ga0598239", "Ga0598239",
"Ga0598239", "Ga0598239", "Ga0598239", "Ga0598239", "Ga0598239",
"Ga0598240", "Ga0598240", "Ga0598240", "Ga0598240", "Ga0598240",
"Ga0598240"), reaction = c("arsenate-reduction", "carbon-fixation",
"formaldehyde-oxidation", "halogenated-compounds-breakdown",
"hydrogen-oxidation", "iron-oxidation", "iron-reduction",
"manganese-oxidation", "methanol-oxidation", "selenate-reduction",
"sulfide-oxidation", "sulfite-reduction", "sulfur-oxidation",
"thiosulfate-disproportionation", "arsenate-reduction", "carbon-fixation",
"formaldehyde-oxidation", "halogenated-compounds-breakdown",
"hydrogen-oxidation", "iron-oxidation"), .rows = structure(list(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, 19L, 20L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -20L), .drop = TRUE))
1条答案
按热度按时间z0qdvdin1#
你想要这个:请注意,您提供的数据已分组。首先是
ungroup
!