在R中使用ggplot2覆盖直方图

h5qlskok 于 2023-05-11 发布在其他

关注(0)|答案(3)|浏览(127)

我是R的新手，正在尝试将3个直方图绘制到同一个图上。一切都很好，但我的问题是，你看不到2个直方图重叠-他们看起来相当切断。
当我绘制密度图时，它看起来很完美：每条曲线都被一条黑色的框线包围，曲线重叠处的颜色看起来不同。
有人能告诉我，如果类似的东西可以实现与直方图在第一张图片？这是我正在使用的代码：

lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)

来源：https://stackoverflow.com/questions/6957549/overlaying-histograms-with-ggplot2-in-r

3条答案

按热度按时间

sgtfey8w1#

使用@joran的样本数据

ggplot(dat, aes(x=xx, fill=yy)) + 
  geom_histogram(alpha=0.2, position="identity")

请注意，geom_histogram()默认值为position="stack"。
参见geom_histogram documentation中的“位置调整”

赞(0）回复(0）举报 2023-05-11

vltsax252#

您当前的代码：

ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)

告诉ggplot使用f0中的所有值构建 one 直方图，然后根据变量utt为这个直方图的条形图着色。
相反，您需要创建三个单独的直方图，使用Alpha混合，以便它们彼此可见。因此，您可能希望使用三个单独的geom_histogram调用，其中每个调用都获得自己的 Dataframe 并填充：

ggplot(histogram, aes(f0)) + 
    geom_histogram(data = lowf0, fill = "red", alpha = 0.2) + 
    geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
    geom_histogram(data = highf0, fill = "green", alpha = 0.2) +

下面是一个包含一些输出的具体示例：

dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))

ggplot(dat,aes(x=xx)) + 
    geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
    geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
    geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)

它会产生这样的结果：

编辑以修复错别字;你想要的是填充而不是颜色

赞(0）回复(0）举报 2023-05-11

a8jjtwal3#

虽然在ggplot 2中只需要几行来绘制多个/重叠的直方图，但结果并不总是令人满意的。需要正确使用边界和着色以确保眼睛可以区分直方图。
以下功能平衡边界颜色、不透明度和叠加密度图，使查看者能够 * 区分分布 *。

单个直方图：

plot_histogram <- function(df, feature) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
    geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
    geom_density(alpha=0.3, fill="red") +
    geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
    labs(x=feature, y = "Density")
    print(plt)
}

多重直方图：

plot_multi_histogram <- function(df, feature, label_column) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
    geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
    geom_density(alpha=0.7) +
    geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
    labs(x=feature, y = "Density")
    plt + guides(fill=guide_legend(title=label_column))
}

用法：

只需将您的 Dataframe 连同所需参数一起传递给上述函数即可：

plot_histogram(iris, 'Sepal.Width')

plot_multi_histogram(iris, 'Sepal.Width', 'Species')

plot_multi_histogram中的额外参数是包含类别标签的列的名称。
我们可以通过创建一个具有许多不同分发方式的 Dataframe 来更戏剧性地看到这一点：

a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))

像以前一样传入数据框（并使用选项扩大图表）：

options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')

要为每个分布添加单独的垂直线：

plot_multi_histogram <- function(df, feature, label_column, means) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
    geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
    geom_density(alpha=0.7) +
    geom_vline(xintercept=means, color="black", linetype="dashed", size=1)
    labs(x=feature, y = "Density")
    plt + guides(fill=guide_legend(title=label_column))
}

与之前的plot_multi_histogram函数相比，唯一的变化是在参数中添加了means，并将geom_vline行更改为接受多个值。

用法：

options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, "n", 'category', c(1, 2, 3, 4, 5, 6))

结果：

因为我在many_distros中显式地设置了均值，所以我可以简单地将它们传递进来。或者，你可以简单地在函数内部计算这些值并以这种方式使用。

赞(0）回复(0）举报 2023-05-11

我来回答

在R中使用ggplot2覆盖直方图

3条答案

相关问题

热门标签

最新问答