R语言 基于关联数据创建在场-缺席矩阵

but5z9lq  于 2023-01-22  发布在  其他
关注(0)|答案(4)|浏览(99)

我已经提到了Create a presence/absence column based on presence records [duplicate]Create a presence-absence matrix with presence on specific datesPresence-absence matrix,但不断遇到我的物种关联列的问题。
使用一个基于灵长类动物行为的大型纵向数据集,我创建了一个物种/关联表。我有一个额外的列,变量,可能是由于我试图对“community_id”进行分组。我的数据集的所有可复制子集如下所示。
输出数据-

data <- structure(list(Species = c("BABO", "BW", "RC", "BW", "RC", "SKS", 
"SKS", "RC", "RC", "SKS", "BW", "RC", "RC", "RC", "RC", "SKS", 
"RC", "SKS", "SKS", "RC"), Association = c(NA, "SKS", NA, "RC", 
"BW", "SKS", NA, NA, NA, "BW", "SKS", NA, "SKS", "BW", "SKS", 
NA, NA, "SKS", NA, "MANG"), variable = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = "community_id", class = "factor"), community_id = c("2007-4-16.C3", 
"2007-4-16.C3", "2007-4-16.C3", "2007-4-17.Mwani", "2007-4-17.Mwani", 
"2007-4-17.Mwani", "2007-4-17.Mwani", "2007-4-18.Sanje", "2007-4-18.Sanje", 
"2007-4-18.Sanje", "2007-4-18.Sanje", "2007-5-8.C3", "2007-5-9.Mwani", 
"2007-5-9.Mwani", "2007-5-9.Mwani", "2007-5-10.Sanje", "2007-5-10.Sanje", 
"2007-6-6.C3", "2007-6-6.C3", "2007-6-6.C3")), row.names = c(NA, 
20L), class = "data.frame")

输出-

Species  Association  variable       community_id
   <chr>    <chr>        <chr>          <chr>
1   BABO    NA           community_id   2007-4-16.C3
2   BW      SKS          community_id   2007-4-16.C3
3   RC      NA           community_id   2007-4-16.C3
4   BW      RC           community_id   2007-4-17.Mwani
5   RC      BW           community_id   2007-4-17.Mwani
6   SKS     SKS          community_id   2007-4-17.Mwani
7   SKS     NA           community_id   2007-4-17.Mwani
8   RC      NA           community_id   2007-4-18.Sanje
9   RC      NA           community_id   2007-4-18.Sanje
10  SKS     BW           community_id   2007-4-18.Sanje
11  BW      SKS          community_id   2007-4-18.Sanje
12  RC      NA           community_id   2007-5-8.C3
13  RC      SKS          community_id   2007-5-9.Mwani
14  RC      BW           community_id   2007-5-9.Mwani
15  RC      SKS          community_id   2007-5-9.Mwani
16  SKS     NA           community_id   2007-5-10.Sanje
17  RC      NA           community_id   2007-5-10.Sanje
18  SKS     SKS          community_id   2007-6-6.C3
19  SKS     NA           community_id   2007-6-6.C3
20  RC      MANG         community_id   2007-6-6.C3

我需要一个按community_id分组的存在-不存在矩阵。我已经尝试过按列“community_id”分组,我相信我就是在这里创建了一个额外的、看起来不相关的“变量”列。我正在寻找以下输出-

community_id         BABO    BW     RC     SKS    Mang
<chr>                <chr>   <chr>  <chr>  <chr>  <chr>
2007-4-16.C3         1       1       1      1      0
2007-4-17.Mwani      0       1       1      1      0
2007-4-18.Sanje      0       1       1      1      0
2007-5-8.C3          0       0       1      0      0 
2007-5-9.Mwani       0       1       1      1      0
2007-5-10.Sanje      0       0       1      1      0
2007-6-6.C3          0       0       1      1      1

任何建议或帮助都是非常感谢的!祝你有愉快的一天。

bpzcxfmw

bpzcxfmw1#

使用pivot_wider的解,首先组合变量 * 物种 * 和 * 关联 *。

library(tidyr)
library(dplyr)

rbind(as.matrix(data[, -2]), as.matrix(data[, -1])) %>%
  as_tibble() %>% 
  distinct() %>% 
  na.omit() %>% 
  pivot_wider(community_id, names_from=Species, values_from=Species, 
    values_fn=function(x) any(unique(x) == x) * 1, values_fill = 0)
# A tibble: 7 × 6
  community_id     BABO    BW    RC   SKS  MANG
  <chr>           <dbl> <dbl> <dbl> <dbl> <dbl>
1 2007-4-16.C3        1     1     1     1     0
2 2007-4-17.Mwani     0     1     1     1     0
3 2007-4-18.Sanje     0     1     1     1     0
4 2007-5-8.C3         0     0     1     0     0
5 2007-5-9.Mwani      0     1     1     1     0
6 2007-5-10.Sanje     0     0     1     1     0
7 2007-6-6.C3         0     0     1     1     1
eiee3dmh

eiee3dmh2#

您可以:

library(dplyr)

data %>%
  group_by(community_id) %>%
  summarize(as_tibble(t(sapply(c("BABO", "BW", "RC", "SKS", "Mang"),
                               function(x) as.numeric(x %in% Species)))))
#> # A tibble: 7 x 6
#>   community_id     BABO    BW    RC   SKS  Mang
#>   <chr>           <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2007-4-16.C3        1     1     1     0     0
#> 2 2007-4-17.Mwani     0     1     1     1     0
#> 3 2007-4-18.Sanje     0     1     1     1     0
#> 4 2007-5-10.Sanje     0     0     1     1     0
#> 5 2007-5-8.C3         0     0     1     0     0
#> 6 2007-5-9.Mwani      0     0     1     0     0
#> 7 2007-6-6.C3         0     0     1     1     0

创建于2023年1月19日,使用reprex v2.0.2

wgmfuz8q

wgmfuz8q3#

碱R溶液

community_idSpeciesunique()组合生成table()

table(unique(data[c("community_id", "Species")]))
Species
community_id      BABO BW RC SKS
  2007-4-16.C3       1  1  1   0
  2007-4-17.Mwani    0  1  1   1
  2007-4-18.Sanje    0  1  1   1
  2007-5-10.Sanje    0  0  1   1
  2007-5-8.C3        0  0  1   0
  2007-5-9.Mwani     0  0  1   0
  2007-6-6.C3        0  0  1   1

tidyverse解决方案

community_idSpeciesdistinct()值的第一子集然后创建一个Present变量,所有观测值均设置为1;然后是pivot_longer(),使用values_fill arg为未观察到的群落-物种组合添加0。
一个二个一个一个

30byixjq

30byixjq4#

基本上你要看完整的物种spc是否是unique ly %in%,即群落。

ufun <- \(x) unique(na.omit(unlist(x)))  ## helper function
# spc <- ufun(data$Association)  ## this might work on your complete data
spc <- c('BABO', 'BW', 'RC', 'SKS', 'Mang')  ## here hard coded

by(data[1:2], data$community_id, \(x) setNames(+(spc %in% ufun(x)), spc)) |>
  do.call(what=rbind)
#                 BABO BW RC SKS Mang
# 2007-4-16.C3       1  1  1   1    0
# 2007-4-17.Mwani    0  1  1   1    0
# 2007-4-18.Sanje    0  1  1   1    0
# 2007-5-10.Sanje    0  0  1   1    0
# 2007-5-8.C3        0  0  1   0    0
# 2007-5-9.Mwani     0  1  1   1    0
# 2007-6-6.C3        0  0  1   1    0

相关问题