如何在R中的分布百分比处添加geom_vline()?

juzqafwq  于 2023-03-27  发布在  其他
关注(0)|答案(2)|浏览(128)

我有一个密度图,看起来像这样:

我想添加垂直线,分别对应于每个面的分布的25%、50%和75%。
我想我可以通过使用每年的汇总统计数据来“手动”完成,但我假设有一种更有效的方法?
下面是我的代码:

merge(usa_census_sub, CPI, by.x = "YEAR", by.y = "year")%>%
  select(YEAR, INCWAGE, rate)%>%
  filter(YEAR >= 1960 & INCWAGE !=999999 & INCWAGE !=0)%>%
  mutate(log_INCWAGE <- (log10(INCWAGE)),
         RWAGE = (log_INCWAGE/rate)*158)%>%
  ggplot(usa_census_sub, mapping = aes(x= RWAGE))+
  geom_density()+
  facet_wrap(~YEAR)
8wtpewkr

8wtpewkr1#

你可以使用stat_summary来计算vlinequantilestat_summary根据x计算y值,这就是x=0的原因。使用fun.data,你可以计算每个方面美学的分位数的坐标,如下所示:

library(ggplot2)
df |>
  ggplot(aes(x = RWAGE)) +
  geom_density() +
  stat_summary(aes(x = 0, y = RWAGE),
               geom = "vline",
               fun.data = function(x) data.frame(xintercept = quantile(x, c(0.25, 0.5, 0.75)))) +
  facet_wrap(~year)

创建于2023-03-26带有reprex v2.0.2

数据:

set.seed(7)
df = data.frame(year = rep(c(1960, 1970, 1980, 1990, 2010), each = 100),
                RWAGE = rep(runif(100, 0, 15), each = 5))
iqjalb3h

iqjalb3h2#

我们可以这样做:
数据:

set.seed(123)
df <- data.frame(
  year = rep(c(1960, 1970, 1980, 1990, 2000, 2010), each = 100),
  value = c(rnorm(100, mean = 10, sd = 3),
            rnorm(100, mean = 12, sd = 2))
)
library(dplyr)
library(ggplot2)

# define quantiles
quantiles <- df %>% 
  group_by(year) %>% 
  summarize(q25 = quantile(value, 0.25), 
            q50 = quantile(value, 0.5), 
            q75 = quantile(value, 0.75))

# plot
ggplot(df, aes(x = value, fill = factor(year))) +
  geom_density(alpha = 0.5) +
  facet_wrap(~ year, ncol = 3)+
  geom_vline(data = quantiles, aes(xintercept = q25),
             linetype = "dashed", size = 1) +
  geom_vline(data = quantiles, aes(xintercept = q50),
             linetype = "dashed", size = 1) +
  geom_vline(data = quantiles, aes(xintercept = q75),
             linetype = "dashed", size = 1)

相关问题