R语言 按组获取ggplot中geom_col的行列

vlurs2pr  于 2023-02-10  发布在  其他
关注(0)|答案(2)|浏览(156)

我试图通过不同得分水平的人口统计数据来计算行百分比--在我的数据中,这将是%的白人(或%的黑人,或%的男性,或%的受教育程度为2的人,等等)得分为0(或1、2或3)--然后使用它来创建一个大图。
因此,在我下面的示例数据中,人种== 1(即白人)中8.33%的人得分为0,25%的人得分为1,25%的人得分为2,41.67%的人得分为3。
然后,最终目标将是得到某种类型的条形图,其中4个水平的'分数'是横跨x轴,人口统计的各种比较运行下来的y轴。一些看起来像这样的视觉效果,但与'分数'的水平,而不是教育水平的顶部:

.
我已经有了一些代码来生成实际的数字,我已经在其他示例中使用了外部/已经计算过的百分比:

ggplot(data, aes(x = percent, y = category, fill = group)) +
  geom_col(orientation = "y", width = .9) +
  facet_grid(group~score_var, 
             scales = "free_y", space = "free_y") +
  labs(title = "Demographic breakdown of 'Score'") +
  theme_bw()

我正在努力找出计算这些行百分比的最佳方法,大概是使用group_by()summarize,然后以一种可以绘制它们的方式存储或配置它们。

d <- structure(list(race = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 
1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 
3, 3), gender = c(0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 
0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1
), education = c(1, 3, 3, 2, 1, 3, 2, 3, 4, 4, 2, 3, 3, 2, 3, 
4, 1, 3, 1, 3, 3, 2, 1, 3, 2, 3, 4, 4, 2, 3, 3, 2, 3, 4, 1, 3
), score = c(1, 2, 2, 1, 2, 3, 3, 2, 0, 0, 1, 2, 1, 3, 0, 0, 
3, 3, 3, 3, 3, 3, 3, 3, 2, 1, 2, 3, 1, 3, 3, 0, 1, 2, 2, 0)), row.names = c(NA, 
-36L), spec = structure(list(cols = list(race = structure(list(), class = c("collector_double", 
"collector")), gender = structure(list(), class = c("collector_double", 
"collector")), education = structure(list(), class = c("collector_double", 
"collector")), score = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000001bd978b0df0>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))
flseospp

flseospp1#

这可能会帮助您开始:

library(dplyr)
library(ggplot2)
prop <- data %>% 
    mutate(race = factor(race, levels = c(1, 2, 3), labels = c("White", "Black", "Others"))) %>% 
    group_by(race) %>% 
    mutate(race_n  = n()) %>% 
    group_by(race, score) %>% 
    summarise(percent = round(100*n()/race_n[1], 1))

prop %>% 
    ggplot(aes(x = percent, y = score, fill = race)) +
    geom_col(orientation = "y", width = .9) +
    geom_text(aes(label = percent), hjust = 1)+
    facet_grid(~race) +
    labs(title = "Demographic breakdown of 'Score'") +
    theme_bw()

编辑

将所有字符放在一起,可以得到一个更大的图形:

df <- data %>% mutate(
        gender = factor(2-gender), 
        race = factor(race), 
        education = factor(education)) %>%
    pivot_longer(!score, names_to = "character", values_to = "levels")

df %>% group_by(character, levels) %>% 
    mutate(group_n  = n()) %>% 
    group_by(character, levels, score) %>% 
    summarise(percent = round(100*n()/group_n[1], 1)) %>% 
    ggplot(aes(x = percent, y = score, fill = character)) +
    geom_col(orientation = "y", width = .9) +
    geom_text(aes(label = percent), hjust = 1)+
    facet_grid(character ~ levels) +
    labs(title = "Demographic breakdown of 'Score'") +
    theme_bw()

请注意:我已经改变了性别的代码。

db2dz4w8

db2dz4w82#

从@王志强出色的第一关中得到灵感,我终于想出了一个解决办法。我仍然需要改变标签的顺序(把教育水平按顺序排列,把种族变量移到图的顶部),但这基本上是我所设想的。

d_test <- d %>% mutate(
        gender = factor(2-gender), 
        race = factor(race), 
        education = factor(education)) %>%
    pivot_longer(!score, names_to = "group", values_to = "levels")

d_test <- d_test %>% group_by(group, levels) %>% 
    mutate(group_n  = n()) %>% 
    group_by(group, levels, score) %>% 
    summarise(percent = round(100*n()/group_n[1], 1))

d_test <- d_test %>% 
  mutate(var = case_when(group == "gender" & levels == 1 ~ "female",
                         group == "gender" & levels == 2 ~ "male",
                         group == "race" & levels == 1 ~ "white",
                         group == "race" & levels == 2 ~ "black",
                         group == "race" & levels == 3 ~ "hispanic",
                         group == "education" & levels == 1 ~ "dropout HS",
                         group == "education" & levels == 2 ~ "grad HS",
                         group == "education" & levels == 3 ~ "some coll",
                         group == "education" & levels == 4 ~ "grad coll"))

ggplot(d_test, aes(x = percent, y = var, fill = group)) +
  geom_col(orientation = "y", width = .9) +
  facet_grid(group ~ score,
               scales = "free_y", space = "free_y") +
  labs(title = "Demographic breakdown of 'Score'",
         y = "",
         x = "Percent") +
  theme_minimal() +
  theme(legend.position = "none",
        strip.text.y = element_blank())

相关问题