R语言 ggplot中的分组条形图

sirbozc5  于 2023-03-27  发布在  其他
关注(0)|答案(2)|浏览(193)

我有一个调查文件,其中行是观察和列的问题。
以下是一些fake data的外观:

People,Food,Music,People
P1,Very Bad,Bad,Good
P2,Good,Good,Very Bad
P3,Good,Bad,Good
P4,Good,Very Bad,Very Good
P5,Bad,Good,Very Good
P6,Bad,Good,Very Good

我的目标是用ggplot2创建这种图。

  • 我绝对不在乎颜色,设计等
  • 图与假数据不符

以下是我的假数据:

raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)

但是如果我选择Y作为计数,那么我就面临一个选择X和Group值的问题...我不知道如果不使用reshape2是否能成功...我也尝试过使用带有melt函数的reshape。但是我不知道如何使用它...

3vpjnl9f

3vpjnl9f1#

**编辑:**多年后

对于纯ggplot 2 + utils::stack()解决方案,请参阅@markus的answer
一个有点冗长的tidyverse解决方案,所有非基本包都显式地声明,以便您知道每个函数来自哪里:

library(magrittr) # needed for %>% if dplyr is not attached

"http://pastebin.com/raw.php?i=L8cEKcxS" %>%
  utils::read.csv(sep = ",") %>%
  tidyr::pivot_longer(cols = c(Food, Music, People.1),
                      names_to = "variable",
                      values_to = "value") %>%
  dplyr::group_by(variable, value) %>%
  dplyr::summarise(n = dplyr::n()) %>%
  dplyr::mutate(value = factor(
    value,
    levels = c("Very Bad", "Bad", "Good", "Very Good"))
  ) %>%
  ggplot2::ggplot(ggplot2::aes(variable, n)) +
  ggplot2::geom_bar(ggplot2::aes(fill = value),
                    position = "dodge",
                    stat = "identity")

原答案:
首先,你需要得到每个类别的计数,即每个组(食物,音乐,人)有多少Bads和Goods等。这将是这样做的:

raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)

raw=raw[,c(2,3,4)] # getting rid of the "people" variable as I see no use for it

freq=table(col(raw), as.matrix(raw)) # get the counts of each factor level

然后,您需要从中创建一个数据框,将其融化并绘制:

Names=c("Food","Music","People")     # create list of names
data=data.frame(cbind(freq),Names)   # combine them into a data frame
data=data[,c(5,3,1,2,4)]             # sort columns

# melt the data frame for plotting
data.m <- melt(data, id.vars='Names')

# plot everything
ggplot(data.m, aes(Names, value)) +   
  geom_bar(aes(fill = variable), position = "dodge", stat="identity")

这就是你要找的吗?

为了澄清一点,在ggplot multiple grouping bar中,您有一个看起来像这样的 Dataframe :

> head(df)
  ID Type Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE
1  1    A  1980   450   338   154    36    13     9
2  2    A  2000   288   407   212    54    16    23
3  3    A  2020   196   434   246    68    19    36
4  4    B  1980   111   326   441    90    21    11
5  5    B  2000    63   298   443   133    42    21
6  6    B  2020    36   257   462   162    55    30

由于第4-9列中有数值,稍后将在y轴上绘制,因此可以很容易地用reshape进行转换并绘制。
对于我们当前的数据集,我们需要类似的东西,所以我们使用freq=table(col(raw), as.matrix(raw))来获得:

> data
   Names Very.Bad Bad Good Very.Good
1   Food        7   6    5         2
2  Music        5   5    7         3
3 People        6   3    7         4

想象一下,你有Very.BadBadGood等等,而不是X1PCEX2PCEX3PCE。看到相似之处了吗?但是我们需要先创建这样的结构。因此有了freq=table(col(raw), as.matrix(raw))

xnifntxz

xnifntxz2#

在@jakub的回答中,计算是在数据传递到ggplot()之前完成的,这就是为什么geom_bar中的stat被设置为"identity"(即按原样获取数据,不做任何事情)。
另一种方法是让ggplot为您计数,因此我们可以使用stat = "count",即geom_bar的默认值:

library(ggplot2)
ggplot(stack(df1[, -1]), aes(ind, fill = values)) +
         geom_bar(position = "dodge")

数据

df1 <- read.csv(text = "People,Food,Music,People
P1,Very Bad,Bad,Good
P2,Good,Good,Very Bad
P3,Good,Bad,Good
P4,Good,Very Bad,Very Good
P5,Bad,Good,Very Good
P6,Bad,Good,Very Good
P7,Bad,Very Bad,Good
P8,Very Good,Very Bad,Good
P9,Very Bad,Good,Bad
P10,Bad,Good,Very Bad
P11,Good,Bad,Very Bad
P12,Very Bad,Bad,Very Good
P13,Bad,Very Good,Bad
P14,Bad,Very Good,Very Bad
P15,Good,Good,Good
P16,Very Bad,Very Good,Very Bad
P17,Very Bad,Good,Good
P18,Very Bad,Very Bad,Bad
P19,Very Good,Very Bad,Very Bad
P20,Very Bad,Bad,Good", header = TRUE)

相关问题