R:ggplot2,如何在面板图的每个面板上注解汇总统计

5t7ly7z5  于 2023-06-03  发布在  其他
关注(0)|答案(2)|浏览(154)

如何添加文本注解(例如:sd = sd_value)的标准偏差,在R中使用ggplot2。

library(datasets)
data(mtcars)
ggplot(data = mtcars, aes(x = hp)) + 
        geom_dotplot(binwidth = 1) + 
        geom_density() + 
        facet_grid(. ~ cyl) + 
        theme_bw()

我想贴一张情节的图片,但我没有足够的代表。
我认为“geom_text”或“annotate”可能有用,但我不确定如何使用。

fumotvh3

fumotvh31#

如果你想改变每个方面的文本标签,你将需要使用geom_text。如果希望在每个方面显示相同的文本,可以使用annotate

p <- ggplot(data = mtcars, aes(x = hp)) + 
  geom_dotplot(binwidth = 1) + 
  geom_density() + 
  facet_grid(. ~ cyl)

mylabels <- data.frame(
  cyl = c(4, 6, 8), 
  label = c("first label", "second label different", "and another")
)

p + geom_text(x = 200, y = 0.75, aes(label = label), data = mylabels)

### compare that to this way with annotate

p + annotate("text", x = 200, y = 0.75, label = "same label everywhere")

现在,如果你真的想在这个例子中得到cyl的标准差,我可能会先用dplyr来计算,然后用geom_text来完成,如下所示:

library(ggplot2)
library(dplyr)
    
df.sd.hp <- mtcars %>%
  group_by(cyl) %>%
  summarise(hp.sd = round(sd(hp), 2))
    
ggplot(data = mtcars, aes(x = hp)) + 
  geom_dotplot(binwidth = 1) + 
  geom_density() + 
  facet_grid(. ~ cyl) +
  geom_text(
    data = df.sd.hp, 
    aes(label = paste0("SD: ", hp.sd))
    x = 200, y = 0.75
  )
x6h2sr28

x6h2sr282#

我更喜欢当统计数据出现在facet标签本身中时图形的外观。我编写了以下脚本,允许选择显示标准差平均值计数。本质上,它计算汇总统计量,然后将其与名称合并,以便您拥有格式 CATEGORY(SUMMARY STAT = VALUE)

#' Function will update the name with the statistic of your choice
AddNameStat <- function(df, category, count_col, stat = c("sd","mean","count"), dp= 0){

  # Create temporary data frame for analysis
  temp <- data.frame(ref = df[[category]], comp = df[[count_col]])

  # Aggregate the variables and calculate statistics
  agg_stats <- plyr::ddply(temp, .(ref), summarize,
                           sd = sd(comp),
                           mean = mean(comp),
                           count = length(comp))

  # Dictionary used to replace stat name with correct symbol for plot
  labelName <- mapvalues(stat, from=c("sd","mean","count"), to=c("\u03C3", "x", "n"))

  # Updates the name based on the selected variable
  agg_stats$join <- paste0(agg_stats$ref, " \n (", labelName," = ",
                           round(agg_stats[[stat]], dp), ")")

  # Map the names
  name_map <- setNames(agg_stats$join, as.factor(agg_stats$ref))
  return(name_map[as.character(df[[category]])])
}

使用这个脚本与您的原始问题:

library(datasets)
data(mtcars)

# Update the variable name
mtcars$cyl  <- AddNameStat(mtcars, "cyl", "hp", stat = "sd")

ggplot(data = mtcars, aes(x = hp)) + 
  geom_dotplot(binwidth = 1) + 
  geom_density() + 
  facet_grid(. ~ cyl) + 
  theme_bw()

该脚本应该很容易改变,包括其他汇总统计。我也相信它可以部分改写,使它更干净一点!

相关问题