使用hexbins显示分类变量的比例(如hextri)

3wabscal  于 2023-07-31  发布在  其他
关注(0)|答案(1)|浏览(80)

通过这个网站的建议,我在ggplot中构建了一个hexbin图,它显示了每个bin中数据点的计数,并突出显示了感兴趣的特定bin。
现在我想进一步扩展这个图,以显示每个hexbin中第二个分组类别的比例。这已经可以用hextri包实现了,但是我不能把我上一个问题的ggplot解决方案和hextri包的输出结合起来。
最终目标是得到一个看起来像hextri包输出的图,并突出显示感兴趣的单元格。
下面是一些示例数据代码,可以创建具有突出显示的单元格的ggplot和具有所示分类比例的hextri plot。这两个特点是我想合并的。
我已经尝试过使用hextri函数的边界输入来实现期望的结果,但还没有成功。

library(hextri)
library(ggplot2)

n = 100

df = data.frame(x = rnorm(n), 
                y = rnorm(n),
                group = sample(0:1, n, prob = c(0.9, 0.1), replace = TRUE))

# hextri plot
hextri_plot = hextri(
  df$x,
  df$y,
  class = df$group,
  colour = c("#00b38a", "#ea324c"),
  nbins = 3,
  diffuse = FALSE, 
  sorted = TRUE
) 

# GGplot
ggplot(df, aes(x = x, y = y)) +
  geom_hex() +
  stat_summary_hex(aes(
    z = group,
    color = after_stat(as.character(value))
  ), fun = ~ +any(.x == 1), fill = NA) +
  scale_color_manual(
    values = c("0" = "transparent", "1" = "yellow"),
    guide = "none"
  )

字符串

vq8itlhq

vq8itlhq1#

这不是一个微不足道的问题。它需要写入一个新的Geom、一个新的Stat和一个新的Grob(见下文)。我个人并不认为这是一个很好的数据可视化选项,因为它会扭曲位置,并涉及显著的舍入误差。然而,它在视觉上很吸引人,而且相当直观,所以我还是继续写了一个geom_hextri。为了让它工作,我们简单地将其美学Map到一个分类变量,它应该像预期的那样表现。
让我们使用您自己的示例数据:

set.seed(1)
n = 100

df = data.frame(x = rnorm(n), 
                y = rnorm(n),
                group = sample(0:1, n, prob = c(0.9, 0.1), replace = TRUE))

字符串
并使用您选择的配色方案用geom_hextri绘制它。我们将覆盖点,这样我们就可以确保段填充的逻辑与点匹配。

ggplot(df, aes(x, y, fill = factor(group), color = factor(group))) + 
  geom_hextri(linewidth = 0.3, bins = 4) + 
  geom_point(shape = 21, size = 3, color = "black") +
  coord_equal() + 
  theme_classic(base_size = 16) + 
  theme(aspect.ratio = 1) +
  scale_fill_manual("Group", values =  c("#00b38a", "#ea324c")) +
  scale_color_manual("Group", values =  c("#00b38a", "#ea324c"))


x1c 0d1x的数据
请注意,如果我们愿意,可以很容易地更改bin大小和外观。要在三角形周围获得实心六边形,我们只需添加一个geom_hex层:

ggplot(df, aes(x, y, fill = factor(group))) + 
  geom_hextri(color = "black", linewidth = 0.1, bins = 5) + 
  geom_point(shape = 21, size = 3) +
  geom_hex(fill = NA, color = "black", linewidth = 1, bins = 5) +
  coord_equal() + 
  theme_classic(base_size = 16) + 
  theme(aspect.ratio = 1) +
  scale_fill_manual("Group", values = c("gray", "red"))



应用到另一个数据集,我们得到:

ggplot(iris, aes(Sepal.Width, Sepal.Length, fill = Species)) + 
  geom_hextri(color = "white", linewidth = 0.1, bins = 5) + 
  geom_point(shape = 21, size = 3, position = position_jitter(0.03, 0.03),
             color = "white") +
  geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
  coord_equal() + 
  theme_minimal(base_size = 20) + 
  theme(aspect.ratio = 1) +
  scale_fill_brewer(palette = "Set2")

还请注意,我们不需要使用填充美学。例如,我们可以简单地改变轮廓颜色:

ggplot(iris, aes(Sepal.Width, Sepal.Length, colour = Species)) + 
  geom_hextri(fill = NA, linewidth = 2, bins = 5, alpha = 1) + 
  geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
  coord_equal() + 
  theme_minimal(base_size = 20) + 
  theme(aspect.ratio = 1) +
  scale_colour_brewer(palette = "Set1")

geom_hextri代码

  • 现在困难的部分-geom_hextri的实现。我试着把它分解成块,但是代码一定很长,而且很难解释得很详细。我还不得不牺牲一点间距,让它适合不需要滚动的代码框。*

最终,ggplot必须将绘图设备上的对象绘制为图形对象(grobs),但是没有现成的grob可以绘制这些六边形段,因此我们需要定义一个函数,在给定适当的x,y坐标,高度,宽度,图形参数和我们正在处理的段的情况下,使用grid::polygonGrob绘制它们。这需要接受矢量化数据才能与ggplot一起使用:

hextriGrob <- function(x, y, seg, height, width, gp = grid::gpar()) {

  gp <- lapply(seq_along(x), function(i) structure(gp[i], class = "gpar"))
  xl  <- x - width
  xr  <- x + width
  y1  <- y + 2 * height
  y2  <- y + height
  y3  <- y - height
  y4  <- y - 2 * height
  pg  <- grid::polygonGrob
  
  do.call(grid::gList, 
    Map(function(x, y, xl, xr, y1, y2, y3, y4, seg, gp) {
      if(seg == 1) return(pg(x = c(x, x, xr, x),  y = c(y, y1, y2, y), gp = gp))
      if(seg == 2) return(pg(x = c(x, xr, xr, x), y = c(y, y2, y3, y), gp = gp))
      if(seg == 3) return(pg(x = c(x, xr, x, x),  y = c(y, y3, y4, y), gp = gp))
      if(seg == 4) return(pg(x = c(x, x, xl, x),  y = c(y, y4, y3, y), gp = gp))
      if(seg == 5) return(pg(x = c(x, xl, xl, x), y = c(y, y3, y2, y), gp = gp))
      if(seg == 6) return(pg(x = c(x, xl, x, x),  y = c(y, y2, y1, y), gp = gp))
  }, x = x, y = y, xl = xl, xr = xr, y1 = y1, 
     y2 = y2, y3 = y3, y4 = y4, seg = seg, gp = gp))
}


但这本身还不够。我们还需要定义一个geom,它继承自GeomHex,但有自己的compute_group方法来适当地调用我们的hextriGrob函数。它的一部分工作是确保美学被正确地划分为几个部分,由于技术原因,这不可能在Stat层中轻松完成。

GeomHextri <- ggproto("GeomHextri", GeomHex,
  draw_group = function (self, data, panel_params, coord, lineend = "butt",
                         linejoin = "mitre", linemitre = 10) {
    table_six <- function(vec) {
      if(!is.factor(vec)) vec <- factor(vec)
      tab <- round(6 * table(vec, useNA = "always")/length(vec))
      n <- diff(c(0, findInterval(cumsum(tab) / sum(tab), 1:6/6)))
      rep(names(tab), times = n)
    }
    num_cols <- sapply(data, is.numeric)
    non_num_cols <- names(data)[!num_cols]
    num_cols <- names(data)[num_cols]
    datasplit <- split(data, interaction(data$x, data$y, drop = TRUE))
    data <- do.call("rbind", lapply(seq_along(datasplit), function(i) {
      num_list <- lapply(datasplit[[i]][num_cols], function(x) rep(mean(x), 6))
      non_num_list <- lapply(datasplit[[i]][non_num_cols], function(x) {
        table_six(rep(x, times = datasplit[[i]]$count))})
      d <- datasplit[[i]][rep(1, 6),]
      d[num_cols] <- num_list
      d[non_num_cols] <- non_num_list
      d$tri <- 1:6
      d$group <- i
      d}))
    data <- ggplot2:::check_linewidth(data, snake_class(self))
    if (ggplot2:::empty(data))  return(zeroGrob())
    coords <- coord$transform(data, panel_params)
    hw <- c(min(diff(unique(sort(coords$x)))), 
            min(diff(unique(sort(coords$y))))/3)
    hextriGrob(coords$x, coords$y, data$tri, hw[2], hw[1],
      gp = grid::gpar(col = data$colour, fill = alpha(data$fill, data$alpha),
                      lwd = data$linewidth * .pt, lty = data$linetype,
                      lineend = lineend, linejoin = linejoin,
                      linemitre = linemitre))})


在我们的数据到达这个地理位置之前,它需要被分成六边形。不幸的是,现有的StatBinhex将无法在保留我们需要的单个片段级美学细节的同时做到这一点,所以我们必须编写自己的分箱函数:

hexify <- function (x, y, z, xbnds, ybnds, xbins, ybins, binwidth,
                    fun = mean, fun.args = list(),
                    drop = TRUE) {

  hb <- hexbin::hexbin(x, xbnds = xbnds, xbins = xbins, y,
                       ybnds = ybnds, shape = ybins/xbins, IDs = TRUE)
  value <- rlang::inject(tapply(z, hb@cID, fun, !!!fun.args))
  out <- hexbin::hcell2xy(hb)
  out <- ggplot2:::data_frame0(!!!out)
  out$value <- as.vector(value)
  out$width <- binwidth[1]
  out$height <- binwidth[2]
  if (drop) out <- stats::na.omit(out)
  out
}


然后必须在自定义Stat中使用:

StatHextri <- ggproto("StatBinhex", StatBinhex,
  default_aes = aes(weight = 1, alpha = after_stat(count)),
  compute_panel = function (self, data, scales, ...) {
    if (ggplot2:::empty(data)) return(ggplot2:::data_frame0())
    data$group <- 1
    self$compute_group(data = data, scales = scales, ...)},
  compute_group = function (data, scales, binwidth = NULL, bins = 30,
                            na.rm = FALSE){
    `%||%` <- rlang::`%||%`
    rlang::check_installed("hexbin", reason = "for `stat_binhex()`")
    binwidth <- binwidth %||% ggplot2:::hex_binwidth(bins, scales)
    if (length(binwidth) == 1) binwidth <- rep(binwidth, 2)
    wt <- data$weight %||% rep(1L, nrow(data))
    non_pos <- !names(data) %in% c("x", "y", "PANEL", "group")
    is_num  <- sapply(data, is.numeric)
    aes_vars <- names(data)[non_pos & !is_num]
    grps <- do.call("interaction", c(as.list(data[aes_vars]), drop = TRUE))
    xbnds <- ggplot2:::hex_bounds(data$x, binwidth[1])
    xbins <- diff(xbnds)/binwidth[1]
    ybnds <- ggplot2:::hex_bounds(data$y, binwidth[2])
    ybins <- diff(ybnds)/binwidth[2]
    do.call("rbind", Map(function(data, wt) {
      out <- hexify(data$x, data$y, wt, xbnds, ybnds, xbins,
                    ybins, binwidth, sum)
      for(var in aes_vars) out[[var]] <- data[[var]][1]
      out$density <- as.vector(out$value/sum(out$value, na.rm = TRUE))
      out$ndensity <- out$density/max(out$density, na.rm = TRUE)
      out$count <- out$value
      out$ncount <- out$count/max(out$count, na.rm = TRUE)
      out$value <- NULL
      out$group <- 1
      out}, split(data, grps), split(wt, grps)))})


最后,我们需要编写一个geom函数,这样我们就可以在ggplot调用中轻松调用上述所有函数:

geom_hextri <- function(
    mapping     = aes(),
    data        = NULL,
    stat        = "hextri",
    position    = "identity",
    na.rm       = FALSE,
    show.legend = NA,
    inherit.aes = TRUE,
    bins        = 10,
    ...) {
  
      ggplot2::layer(
        geom        = GeomHextri,
        data        = data,
        mapping     = mapping,
        stat        = stat,
        position    = position,
        show.legend = show.legend,
        inherit.aes = inherit.aes,
        params      = list(na.rm = na.rm, bins = bins, ...)
      )
  }

相关问题