使用dplyr计算具有幻像零点的平均值

ee7vknir  于 2023-04-27  发布在  其他
关注(0)|答案(1)|浏览(121)

我正试图从一个标准的森林“巡航”中计算树木体积数据的汇总统计数据。基本上,我们在一个被称为“林分”的森林空间划定区域内的几个随机定位的圆形地块上测量树木体积。我想提供按物种分组的汇总统计数据。有些树种在林分中很罕见,只出现在少数地块中。我们不需要t记录这些物种在没有出现的地块中的任何数据,但从技术上讲,我们在没有出现的地块中观察到这些物种的体积为0。因此,平均值可以适当地计算为体积除以巡航中地块总数的总和,但当使用dplyr::summarize()时,平均值的计算方法为体积之和除以某一树种出现的地块数。例如:

##Loading Necessary Packages##
library(dplyr) 
library(tidyr)

#For Reproducibility#
set.seed(63)

#Creating Fake Data#
plots<-c(1:25)
species<-c("PSME", "PSME", "PSME", "TSHE", "PSME", "PSME",  "PSME", "THPL", "PIMO3", "ACMA3")
trees<-data.frame(plot_id=rep(plots,5), species=sample(species, 125, replace=TRUE), vol=rnorm(125, 3250, 30)) %>% 
  arrange(plot_id)

#Calculating Mean volume by species
trees %>% group_by(species) %>% summarize(across(.cols=2,~mean(.x)))

有没有一个选项我可以传递给dplyr::summarize(),让它计算出平均值,即给定物种的体积之和除以巡航中的地块总数?下面是我的一些丑陋的解决方案,以实现我想要的结果。

##Continued from previous chunk of code##

#create a sparse matrix using pivot_wider()#
trees_sparse<-trees %>% pivot_wider(id_cols = plot_id, names_from=species, values_from = vol, values_fn=sum)

#Change NAs to Zeros#
trees_sparse[is.na(trees_sparse)]<-0

#Convert back to long format retaining zeros#
trees_zeros<-trees_sparse %>% pivot_longer(cols = -1, names_to = "Species", values_to = "Volume")

#Desired summary statistics#
trees_zeros %>%  group_by(Species) %>%  summarize(across(.cols=2, ~mean(.x)))
vlju58qv

vlju58qv1#

您可以使用tidyr::complete来填充缺失的物种-小区组合,但您需要额外重新聚合到小区-物种级别。在这种情况下,您也可以通过单独计算分子和分母来避免这种情况。

library(dplyr) 
library(tidyr)

set.seed(63)

plots <- c(1:25)
species <- c("PSME", "PSME", "PSME", "TSHE", "PSME", "PSME", "PSME", "THPL", "PIMO3", "ACMA3")
trees <- data.frame(
  plot_id = rep(plots, 5),
  species = sample(species, 125, replace = TRUE),
  vol = rnorm(125, 3250, 30)
) %>%
  arrange(plot_id)

# This is a tidyr way to do your second block:
trees %>%
  complete(plot_id, species, fill = list(vol = 0)) %>%
  group_by(plot_id, species) %>%
  summarise(plot_vol = sum(vol)) %>%
  group_by(species) %>%
  summarise(avg_vol = mean(plot_vol))
#> # A tibble: 5 × 2
#>   species avg_vol
#>   <chr>     <dbl>
#> 1 ACMA3      908.
#> 2 PIMO3      780.
#> 3 PSME     11550.
#> 4 THPL      1041.
#> 5 TSHE      1957.

# Or you could separately calculate the sum and the denominator:
trees %>%
  group_by(species, n_plots = n_distinct(plot_id)) %>%
  summarize(sum_vol = sum(vol)) %>%
  mutate(mean_vol = sum_vol / n_plots)
#> # A tibble: 5 × 4
#> # Groups:   species [5]
#>   species n_plots sum_vol mean_vol
#>   <chr>     <int>   <dbl>    <dbl>
#> 1 ACMA3        25  22695.     908.
#> 2 PIMO3        25  19501.     780.
#> 3 PSME         25 288743.   11550.
#> 4 THPL         25  26030.    1041.
#> 5 TSHE         25  48935.    1957.

创建于2023-04-20使用reprex v2.0.2

相关问题