R语言 向ggplot2中的条形图函数添加权重

bf1o4zei  于 2023-02-06  发布在  其他
关注(0)|答案(1)|浏览(166)

我正在使用具有250列的调查数据。数据示例如下所示:

q1 <- factor(c("yes",NA,"no","yes",NA,"yes","no","yes"))
q2 <- factor(c("Albania","USA","Albania","Albania","UK",NA,"UK","Albania"))
q3 <- factor(c(0,1,NA,0,1,1,NA,0))
q4 <- factor(c(0,NA,NA,NA,1,NA,0,0))
q5 <- factor(c("Dont know","Prefer not to answer","Agree","Disagree",NA,"Agree","Agree",NA))
q6 <- factor(c(1,NA,3,5,800,NA,900,2))
sector <- factor(c("Energy","Water","Energy","Other","Other","Water","Transportation","Energy"))
weights <- factor(c(0.13,0.25,0.13,0.22,0.22,0.25,0.4,0.13)

data <- data.frame(q1,q2,q3,q4,q5,q6,sector,weights)

在stackoverflow的帮助下,我创建了以下函数来循环列并创建条形图,其中x轴显示响应的百分比,y轴显示基础列,填充是扇区。

plot_fun <- function(variable) {
  total <- sum(!is.na(data[[variable]]))
  
  data <- data |> 
    filter(!is.na(.data[[variable]])) |> 
    group_by(across(all_of(c("sector", variable)))) |> 
    summarise(n = n(), .groups = "drop_last") |> 
    mutate(pct = n / sum(n)) |> 
    ungroup()
  
  ggplot(
    data = data,
    mapping = aes(fill = sector, x = pct, y = .data[[variable]])
  ) +
    geom_col(position = "dodge") +
    labs(
      y = variable, x = "Percentage of responses", fill = "Sector legend",
      caption = paste("Total =", total)
    ) +
    geom_text(
      aes(
        label = scales::percent(pct, accuracy = 0.1)
      ),
      position = position_dodge(.9), vjust = 0.5
    ) +
    scale_x_continuous(labels=function(x) paste0(x*100))+
    scale_fill_brewer(palette = "Accent")+
    theme_bw() +
    theme(panel.grid.major.y = element_blank()) 
}

现在,我想应用调查权重,以便条形图显示加权响应百分比。我尝试将weight = data$weights添加到mapping(),但没有成功。我还尝试通过summarise(n= sum(weights))在百分比计算中应用权重,但也没有成功。
有没有办法修改我的代码,使权重的应用?谢谢你事先。

z9zf31ra

z9zf31ra1#

现在还不清楚如何应用权重。这里我假设您想用权重乘以百分比。注意,您需要修改数据。如果您想将权重用作计算的数值,则不应将其作为因子。无论如何,在group_by中使用权重,以便它们继续,然后在mutate中创建加权百分比。

total <- sum(!is.na(data[[variable]]))
    
    data <- data |> 
        filter(!is.na(.data[[variable]])) |> 
        group_by(across(all_of(c("sector", "weights", variable)))) |> 
        summarise(n = n(), .groups = "drop_last") |> 
        mutate(pct = n / sum(n), wpct  = pct*weights) |> 
        ungroup()
    
    ggplot(
        data = data,
        mapping = aes(fill = sector, x = wpct, y = .data[[variable]])
    ) +
        geom_col(position = "dodge") +
        labs(
            y = variable, x = "Percentage of responses", fill = "Sector legend",
            caption = paste("Total =", total)
        ) +
        geom_text(
            aes(
                label = scales::percent(wpct, accuracy = 0.1)
            ),
            position = position_dodge(.9), vjust = 0.5
        ) +
        scale_x_continuous(labels=function(x) paste0(x*100))+
        scale_fill_brewer(palette = "Accent")+
        theme_bw() +
        theme(panel.grid.major.y = element_blank()) 
}

如果这样做不奏效,请明确说明如何使用权重以及最终结果值应该是什么。

相关问题