R语言 如何从3D数组创建长格式数据框(用于直方图的分面图)

bmvo0sr5  于 2023-04-09  发布在  其他
关注(0)|答案(2)|浏览(96)

我正在运行一些模拟测试,其中我在相应的范围内改变两个参数(“x”和“y”)中的每一个,然后计算每个参数组合的结果错误分数。我运行了很多次这个模拟,我想在西姆斯中可视化每个x/y组合的错误分数分布。
下面的代码实现了这一点,但是我创建长格式数据框(用于ggplot)的方式 * 方式 * 太过笨拙和“手动”。是的,我使用了一种模式来从数组维度创建x,y和z长格式列值。但是,唉!
(Also,当我通过向量转换将数组转换为长格式 Dataframe 时,我丢失了数组的维度名称。)

### Create a 3D array of "errors"
# - The first two dimensions are for variation across params "x" and "y"
# - The third dimension, "z", represents sim runs for each x/y combination
dims <- c(2, 3, 50) 
# The means around which my simulated errors will vary
errorMeans <- 1:(prod(dims[1:2])) 
# Generate "errors" (varying around the errorMeans)
errorVec <- rnorm(prod(dims), mean=errorMeans, sd=0.4)
# My "starting point": A 2D array of error scores, across sims (the 3rd dim)
errorArray <- array(errorVec, 
                    dims, 
                    dimnames=list(x=1:dims[1], y=1:dims[2], z=1:dims[3]))

### Create a long-form data frame from the 3D array
# Read the array into a vector
errorVec <- as.vector(errorArray)
# Write the vector to a long-form data frame (my approach: ugh!) 
dfLong <- data.frame(error=errorVec, 
                    x=rep(1:dims[1], prod(dims[2:3])), 
                    y=rep(rep(1:dims[2], each=dims[1]), dims[3]),
                    z=rep(1:dims[1], each=prod(dims[2:3])))

### Create a faceted histogram plot, showing error variation across the sims (the 3rd dim, "z")
plt <- ggplot(data=dfLong, aes(x=error)) +
  geom_histogram(fill="steelblue") + 
  facet_grid(vars(x), vars(y))
plot(plt)

Faceted histogram of variation in array dimension "z" for all combinations of dimensions "x" and "y"
一定有一种方法可以使用,比如说,dplyr的pivot_longger(),我只是不知道如何从一个(3D)数组和一个矩阵中使用它。

6jjcrrmo

6jjcrrmo1#

尝试:

df2 <- reshape2::melt(errorArray)

这将减少对象的维度并将其添加为列:

## Your solution:
    > head(dfLong)
          error x y z
    1 0.7056645 1 1 1
    2 1.7947472 2 1 1
    3 2.2723746 1 2 1
    4 4.0289590 2 2 1
    5 4.9018582 1 3 1
    6 5.3910886 2 3 1

## My solution:     
> head(df2) # My 
      x y z     value
    1 1 1 1 0.7056645
    2 2 1 1 1.7947472
    3 1 2 1 2.2723746
    4 2 2 1 4.0289590
    5 1 3 1 4.9018582
    6 2 3 1 5.3910886
zf9nrax1

zf9nrax12#

这里有一个方法,在代码中注解。

set.seed(2023)    # make results reproducible

dims <- c(2, 3, 50) 
# The means around which my simulated errors will vary
errorMeans <- 1:(prod(dims[1:2])) 
# Generate "errors" (varying around the errorMeans)
errorVec <- rnorm(prod(dims), mean=errorMeans, sd=0.4)
# My "starting point": A 2D array of error scores, across sims (the 3rd dim)
errorArray <- array(errorVec, 
                    dims, 
                    dimnames=list(x=1:dims[1], y=1:dims[2], z=1:dims[3]))

# question's code
errorVec <- as.vector(errorArray)
# Write the vector to a long-form data frame (my approach: ugh!) 
dfLong <- data.frame(error=errorVec, 
                     x=rep(1:dims[1], prod(dims[2:3])), 
                     y=rep(rep(1:dims[2], each=dims[1]), dims[3]),
                     z=rep(1:dims[1], each=prod(dims[2:3])))

# create a data.frame of x, y, z values
dfRui <- do.call(expand.grid, lapply(dim(errorArray), seq))
dfRui <- cbind.data.frame(error = c(errorArray), dfRui)
names(dfRui)[-1] <- c("x", "y", "z")
# see what dfRui looks like
head(dfRui, n = 10)
#>        error x y z
#> 1  0.9664863 1 1 1
#> 2  1.6068225 2 1 1
#> 3  2.2499731 1 2 1
#> 4  3.9255421 2 2 1
#> 5  4.7466057 1 3 1
#> 6  6.4363190 2 3 1
#> 7  0.6345091 1 1 2
#> 8  2.4006559 2 1 2
#> 9  2.8402934 1 2 2
#> 10 3.8127508 2 2 2

# we don't need the z column to be what the question's code create
identical(dfLong[-4], dfRui[-4])
#> [1] TRUE

# plot code, copy & paste from the question
library(ggplot2)

plt <- ggplot(data = dfLong, aes(x = error)) +
  geom_histogram(fill = "steelblue") + 
  facet_grid(vars(x), vars(y))
plot(plt)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

创建于2023-04-07带有reprex v2.0.2

相关问题