R语言 按组计算相对百分比

2jcobegt  于 2023-02-06  发布在  其他
关注(0)|答案(1)|浏览(249)

我有以下数据集:

my_data = structure(list(state = c("State A", "State A", "State A", "State A", 
"State B", "State B", "State B", "State B", "State A", "State A", 
"State A", "State A", "State B", "State B", "State B", "State B"
), city = c("city 1", "city 1", "city 2", "city 2", "city 3", 
"city 3", "city 4", "city 4", "city 1", "city 1", "city 2", "city 2", 
"city 3", "city 3", "city 4", "city 4"), vaccine = c("yes", "no", 
"yes", "no", "yes", "no", "yes", "no", "yes", "no", "yes", "no", 
"yes", "no", "yes", "no"), counts = c(1221, 2233, 1344, 887, 
9862, 2122, 8772, 2341, 1221, 2233, 1344, 887, 9862, 2122, 8772, 
2341), year = c(2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 
2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022)), row.names = c(NA, 
-16L), class = "data.frame")
    • 我的问题:**对于每个城市,我想找出每年接种疫苗的人口百分比。

最后的结果可能是这样的(我只是做了一些数字):

state   city vaccine Relative_Percentage year
1  State A city 1     yes                 0.6 2021
2  State A city 1      no                 0.4 2021
3  State A city 2     yes                 0.3 2021
4  State A city 2      no                 0.7 2021

以这篇文章为例(Relative frequencies / proportions with dplyr),我尝试了以下代码:

library(dplyr)
my_data %>%
  group_by(year, state, city, vaccine) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n))

但我不认为我的代码是正确的-所有的百分比都正好是0.5

`summarise()` has grouped output by 'year', 'state', 'city'. You can override using the `.groups` argument.
# A tibble: 16 x 6
# Groups:   year, state, city [8]
    year state   city   vaccine     n  freq
   <dbl> <chr>   <chr>  <chr>   <int> <dbl>
 1  2021 State A city 1 no          1   0.5
 2  2021 State A city 1 yes         1   0.5

有人能告诉我如何解决这个问题吗?
谢谢!

2ic8powd

2ic8powd1#

对于每个城市,我想找出每年接种疫苗的人口百分比。
分组中不要包含vaccine,可以将state保留在分组中,以区分city。另外,如果要计算counts的百分比,则需要在summarize中计算;因为您已经删除了counts,所以以后不可能再查看它。尝试在freq的计算中使用n只是计算数据库中行的百分比,而不是接种疫苗的人的百分比。
既然你想知道哪种疫苗有哪种频率,就把它加到总结里吧。

my_data %>%
  group_by(year, state, city) %>%
  summarise(vaccine, n = n(), freq = counts / sum(counts), .groups = "drop")
# # A tibble: 16 × 6
#     year state   city   vaccine     n  freq
#    <dbl> <chr>   <chr>  <chr>   <int> <dbl>
#  1  2021 State A city 1 yes         2 0.354
#  2  2021 State A city 1 no          2 0.646
#  3  2021 State A city 2 yes         2 0.602
#  4  2021 State A city 2 no          2 0.398
#  5  2021 State B city 3 yes         2 0.823
#  6  2021 State B city 3 no          2 0.177
#  7  2021 State B city 4 yes         2 0.789
#  8  2021 State B city 4 no          2 0.211
#  9  2022 State A city 1 yes         2 0.354
# 10  2022 State A city 1 no          2 0.646
# 11  2022 State A city 2 yes         2 0.602
# 12  2022 State A city 2 no          2 0.398
# 13  2022 State B city 3 yes         2 0.823
# 14  2022 State B city 3 no          2 0.177
# 15  2022 State B city 4 yes         2 0.789
# 16  2022 State B city 4 no          2 0.211

坦率地说,我们并不“需要”summarize,我们可以将其修改进来,因为计数似乎已经聚合。

my_data %>%
  group_by(year, state, city) %>%
  mutate(freq = counts / sum(counts)) %>%
  ungroup()
# # A tibble: 16 × 6
#    state   city   vaccine counts  year  freq
#    <chr>   <chr>  <chr>    <dbl> <dbl> <dbl>
#  1 State A city 1 yes       1221  2021 0.354
#  2 State A city 1 no        2233  2021 0.646
#  3 State A city 2 yes       1344  2021 0.602
#  4 State A city 2 no         887  2021 0.398
#  5 State B city 3 yes       9862  2021 0.823
#  6 State B city 3 no        2122  2021 0.177
#  7 State B city 4 yes       8772  2021 0.789
#  8 State B city 4 no        2341  2021 0.211
#  9 State A city 1 yes       1221  2022 0.354
# 10 State A city 1 no        2233  2022 0.646
# 11 State A city 2 yes       1344  2022 0.602
# 12 State A city 2 no         887  2022 0.398
# 13 State B city 3 yes       9862  2022 0.823
# 14 State B city 3 no        2122  2022 0.177
# 15 State B city 4 yes       8772  2022 0.789
# 16 State B city 4 no        2341  2022 0.211

相关问题