R语言 ggplot2中更简单的群体金字塔

slsn1g29  于 2023-01-18  发布在  其他
关注(0)|答案(4)|浏览(132)

我想用ggplot2创建一个人口金字塔。这个问题被问到before,但我相信解决方案肯定要简单得多。

test <- (data.frame(v=rnorm(1000), g=c('M','F')))
require(ggplot2)
ggplot(data=test, aes(x=v)) + 
    geom_histogram() + 
    coord_flip() + 
    facet_grid(. ~ g)

生成了这个图像。在我看来,创建人口金字塔唯一缺少的步骤是反转第一个面的x轴,这样它从50变成0,同时保持第二个面不变。有人能帮忙吗?

2wnc66cl

2wnc66cl1#

这里有一个不分面的解决方案。首先,创建数据框。我使用1到20的值来确保没有一个值是负的(使用人口金字塔,你不会得到负的计数/年龄)。

test <- data.frame(v=sample(1:20,1000,replace=T), g=c('M','F'))

然后分别为每个g值合并两个geom_bar()调用。对于F,计数按原样计算,但对于M,计数乘以-1以获得相反方向的条形图。然后使用scale_y_continuous()获得轴的漂亮值。

require(ggplot2)
require(plyr)    
ggplot(data=test,aes(x=as.factor(v),fill=g)) + 
  geom_bar(subset=.(g=="F")) + 
  geom_bar(subset=.(g=="M"),aes(y=..count..*(-1))) + 
  scale_y_continuous(breaks=seq(-40,40,10),labels=abs(seq(-40,40,10))) + 
  coord_flip()

更新

由于参数subset=.在最新的ggplot2版本中被弃用,因此使用函数subset()可以获得相同的结果。

ggplot(data=test,aes(x=as.factor(v),fill=g)) + 
  geom_bar(data=subset(test,g=="F")) + 
  geom_bar(data=subset(test,g=="M"),aes(y=..count..*(-1))) + 
  scale_y_continuous(breaks=seq(-40,40,10),labels=abs(seq(-40,40,10))) + 
  coord_flip()

3duebb1j

3duebb1j2#

人口金字塔的通用ggplot代码模板(如下)
1.使用geom_col(),而不是geom_bar(),后者具有更好的默认值stat,并避免了对coord_flip()的需要
1.通过在刻度函数中使用labels = abs,避免手动设置标签分隔符。
1.具有相同的男性和女性水平轴(和标签),以便于性别之间的比较-在lemon包中使用scale_x_symmetric()
1.仅使用一个几何体,避免了对数据进行子集化的需要;如果你想在一个小平面图中创建多个金字塔,这是很有用的。
正在创建数据...

set.seed(100)
a <- seq(from = 0, to = 90, by = 10)
d <- data.frame(age = paste(a, a + 10, sep = "-"),
                sex = rep(x = c("Female", "Male"), each = 10),
                pop = sample(x = 1:100, size = 20))
head(d)
#     age    sex pop
# 1  0-10 Female  74
# 2 10-20 Female  89
# 3 20-30 Female  78
# 4 30-40 Female  23
# 5 40-50 Female  86
# 6 50-60 Female  70

剧情代码...

library(ggplot2)
library(lemon)

ggplot(data = d, 
       mapping = aes(x = ifelse(test = sex == "Male", yes = -pop, no = pop), 
                     y = age, fill = sex)) +
  geom_col() +
  scale_x_symmetric(labels = abs) +
  labs(x = "Population")

ifmq2ha2

ifmq2ha23#

扩展@gjabel的帖子,这里是一个更清晰的人口金字塔,同样使用ggplot 2。

popPy1 <- ggplot(data = venDemo, 
   mapping = aes(
      x = AgeName, 
      y = ifelse(test = sex == "M",  yes = -Percent, no = Percent), 
      fill = Sex2,
      label=paste(round(Percent*100, 0), "%", sep="")
   )) +
geom_bar(stat = "identity") +
#geom_text( aes(label = TotalCount, TotalCount = TotalCount + 0.05)) +
geom_text(hjust=ifelse(test = venDemo$sex == "M",  yes = 1.1, no = -0.1), size=6, colour="#505050") +
#  scale_y_continuous(limits=c(0,max(appArr$Count)*1.7)) +
# The 1.1 at the end is a buffer so there is space for the labels on each side
scale_y_continuous(labels = abs, limits = max(venDemo$Percent) * c(-1,1) * 1.1) +
# Custom colours
scale_fill_manual(values=as.vector(c("#d23f67","#505050"))) +
# Remove the axis labels and the fill label from the legend - these are unnecessary for a Population Pyramid
labs(
  x = "",
  y = "",
  fill="", 
  family=fontsForCharts
) +
theme_minimal(base_family=fontsForCharts, base_size=20) +   
coord_flip() +
# Remove the grid and the scale
theme( 
  panel.grid.major = element_blank(), 
  panel.grid.minor = element_blank(),
  axis.text.x=element_blank(), 
  axis.text.y=element_text(family=fontsForCharts, size=20),
  strip.text.x=element_text(family=fontsForCharts, size=24),
  legend.position="bottom",
  legend.text=element_text(size=20)
)

popPy1

pw9qyyiw

pw9qyyiw4#

看看我的人口金字塔:

使用生成的数据,您可以执行以下操作:

# import the packages in an elegant way ####

packages <- c("tidyverse")

installed_packages <- packages %in% rownames(installed.packages())

if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

invisible(lapply(packages, library, character.only = TRUE))

# _________________________________________________________

# create data ####

sex_age <- data.frame(age=rnorm(n = 10000, mean = 50, sd = 9), sex=c(1, 2)))

# _________________________________________________________

# prepare data + build the plot ####

sex_age %>%
  mutate(sex = ifelse(sex == 1, "Male",
                      ifelse(sex == 2, "Female", NA))) %>% # construct from the sex variable: "Male","Female"
  select(age, sex) %>% # pick just the two variables
  table() %>% # table it
  as.data.frame.matrix() %>% # create data frame matrix
  rownames_to_column("age") %>% # rownames are now the age variable
  mutate(across(everything(), as.numeric),
         # mutate everything as.numeric()
         age = ifelse(
           # create age groups 5 year steps
           age >= 18 & age <= 22 ,
           "18-22",
           ifelse(
             age >= 23 & age <= 27,
             "23-27",
             ifelse(
               age >= 28 & age <= 32,
               "28-32",
               ifelse(
                 age >= 33 & age <= 37,
                 "33-37",
                 ifelse(
                   age >= 38 & age <= 42,
                   "38-42",
                   ifelse(
                     age >= 43 & age <= 47,
                     "43-47",
                     ifelse(
                       age >= 48 & age <= 52,
                       "48-52",
                       ifelse(
                         age >= 53 & age <= 57,
                         "53-57",
                         ifelse(
                           age >= 58 & age <= 62,
                           "58-62",
                           ifelse(
                             age >= 63 & age <= 67,
                             "63-67",
                             ifelse(
                               age >= 68 & age <= 72,
                               "68-72",
                               ifelse(
                                 age >= 73 & age <= 77,
                                 "73-77",
                                 ifelse(age >= 78 &
                                          age <= 82, "78-82", "83 and older")
                               )
                             )
                           )
                         )
                       )
                     )
                   )
                 )
               )
             )
           )
         )) %>%
  group_by(age) %>% # group by the age
  summarize(Female = sum(Female), # summarize the sum of each sex
            Male = sum(Male)) %>%
  pivot_longer(names_to = 'sex',
               # pivot longer
               values_to = 'Population',
               cols = 2:3) %>%
  mutate(
    # create a pop perc and a signal 1 / -1
    PopPerc = case_when(
      sex == 'Male' ~ round(Population / sum(Population) * 100, 2),
      TRUE ~ -round(Population / sum(Population) *
                      100, 2)
    ),
    signal = case_when(sex == 'Male' ~ 1,
                       TRUE ~ -1)
  ) %>%
  ggplot() + # build the plot with ggplot2
  geom_bar(aes(x = age, y = PopPerc, fill = sex), stat = 'identity') + # define aesthetics
  geom_text(aes(
    # create the text
    x = age,
    y = PopPerc + signal * .3,
    label = abs(PopPerc)
  )) +
  coord_flip() + # flip the plot
  scale_fill_manual(name = '', values = c('darkred', 'steelblue')) + # define the colors (darkred = female, steelblue = male)
  scale_y_continuous(
    # scale the y-lab
    breaks = seq(-10, 10, 1),
    labels = function(x) {
      paste(abs(x), '%')
    }
  ) +
  labs(
    # name the labs
    x = '',
    y = 'Participants in %',
    title = 'Population Pyramid',
    subtitle = paste0('N = ', nrow(sex_age)),
    caption = 'Source: '
  ) +
  theme(
    # costume the theme
    axis.text.x = element_text(vjust = .5),
    panel.grid.major.y = element_line(color = 'lightgray', linetype =
                                        'dashed'),
    legend.position = 'top',
    legend.justification = 'center'
  ) +
  theme_classic() # choose theme

相关问题