R语言 向数据框中添加一列为字符值而另一列为零值的新观测

8fq7wneg  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(99)

我目前正在处理一个关于特定地点树种密度(每公顷树木数,TPH)的数据集。为了计算所有地点的平均密度,我需要包括在给定地点不存在但在其他地点存在的树种的零值。下面是模拟类似于我的数据框的代码。

# Define number of rows and levels for the grouping variable
set.seed(120)
n_rows <- 10
site_levels <- c("A", "B", "C", "D")

# Create a map of sites and species that can be absent
absent_species <- list(
  North = c("Quercus alba", "Betula papyrifera"),
  South = c("Pinus strobus", "Tsuga canadensis"),
  East = c("Acer rubrum"),
  West = c("Quercus alba", "Tsuga canadensis")
)

# Define species pool and pre-fill empty site vectors
species_pool <- c("Acer rubrum", "Quercus alba", "Pinus strobus", "Tsuga canadensis", "Betula papyrifera")
site_species <- lapply(site_levels, function(site) character(0))

# Simulate Site column
data <- data.frame(Site = sample(site_levels, size = n_rows, replace = TRUE))

# Loop through rows and assign unique species per site
for (i in 1:n_rows) {
  site <- data$Site[i]
  absent_list <- absent_species[[site]]
  species_pool_filtered <- setdiff(species_pool, absent_list)
  
  # Check if all species have been used at this site
  if (length(site_species[[site]]) == length(species_pool_filtered)) {
    # No more species available, skip this row
    next
  }
  
  # Choose a random species from the filtered pool
  species <- sample(species_pool_filtered, size = 1, replace = FALSE)
  
  # Assign species and add it to the site's list
  data$Species[i] <- species
  site_species[[site]] <- c(site_species[[site]], species)
}

# Simulate tree densities with some variation by site
data$TPH <- rnorm(n_rows, 
                  mean = c(500, 250, 100, 350)[match(data$Site, site_levels)],
                  sd = c(100, 50, 25, 75)[match(data$Site, site_levels)])

# Print the simulated dataframe
print(data)

字符串
您会注意到并非所有树种都出现在每个样地中,这通常可以忽略,但它们不存在的事实很重要,因此它们应该作为TPH值为0的新观测值添加。是否有一种简单的方法可以添加给定站点中不存在但在其他站点中存在的物种,并将新观测值分配为TPH值0?
我曾尝试手动计算密度的平均值和标准误差,并简单地将所有密度的总和除以存在的站点数,以说明在不存在物种的站点处物种的零值。我能够以这种方式计算正确的平均值,但无法计算标准误差。

mspsb9vt

mspsb9vt1#

一种可能的方法是使用dplyr::right_join。首先,定义一个包含所有可能的站点和物种组合的框架:

comb <- expand.grid(Site = c("A", "B", "C", "D"), 
                    Species = c("Quercus alba", "Acer rubrum", "Tsuga canadensis"))

字符串
然后使用right_join在目标框架中创建所有缺失的组合,以dplyr::mutatetidyr::replace_na结尾,将NA替换为0:

data1 <- data %>% 
            right_join(comb, by = c("Site", "Species")) %>% 
            mutate(TPH = replace_na(TPH, 0))

jtw3ybtb

jtw3ybtb2#

这可能是从tidyr使用complete的好机会。
在这里,您可以指定您想要的SiteSpecies的所有组合,然后使用fillTPH设置为0,以表示缺少的组合。

library(tidyr)

data |>
  complete(Site, Species, fill = list(TPH = 0))

字符串

输出

Site  Species             TPH
   <chr> <chr>             <dbl>
 1 A     Acer rubrum          0 
 2 A     Betula papyrifera    0 
 3 A     Pinus strobus      552.
 4 A     Quercus alba       480.
 5 A     Tsuga canadensis   456.
 6 B     Acer rubrum          0 
 7 B     Betula papyrifera  227.
 8 B     Pinus strobus        0 
 9 B     Quercus alba         0 
10 B     Tsuga canadensis   238.
# ℹ 11 more rows

相关问题