使用'group_by()'和R中的两个因子变量计数时如何考虑'NA'

guicsvcw  于 2023-01-15  发布在  其他
关注(0)|答案(3)|浏览(134)

我有一个 Dataframe df1,其中我有属于不同地区(df 1 $Regions)的不同站点(df 1 $Site),其中我有关于食草证据及其类型的数据(df 1 $Herbivory_type)。当没有食草时,df 1 $Herbivory_typeNA。下面我展示了我的 Dataframe 的一个例子:

df1 <- data.frame(Region=c("ALI1","ALI1","ALI1","ALI1","ALI2","ALI2","ALI2","ALI3","ALI3","ALI3","ALI3","ALI5","ALI5"),
                  Site=c("ALI1_A","ALI1_B","ALI1_C","ALI1_D","ALI2_A","ALI2_B","ALI2_C","ALI3_A","ALI3_B","ALI3_C","ALI3_D","ALI5_A","ALI5_B"),
                  Herbivory_type=c(NA,"S",NA,NA,NA,NA,NA,NA,"S","S",NA,NA,"S"))

df1$Herbivory_type <- as.factor(df1$Herbivory_type)

df1

   Region   Site Herbivory_type
1    ALI1 ALI1_A           <NA>
2    ALI1 ALI1_B              S
3    ALI1 ALI1_C           <NA>
4    ALI1 ALI1_D           <NA>
5    ALI2 ALI2_A           <NA>
6    ALI2 ALI2_B           <NA>
7    ALI2 ALI2_C           <NA>
8    ALI3 ALI3_A           <NA>
9    ALI3 ALI3_B              S
10   ALI3 ALI3_C              S
11   ALI3 ALI3_D           <NA>
12   ALI5 ALI5_A           <NA>
13   ALI5 ALI5_B              S

我需要知道在df1$Site的计数中考虑到NA的地区食草性事件的数量。我希望得到以下结果:

df2

   Region N_Hervivory_S
1   ALI1             1
2   ALI2             0   # All sites have `NA`, thus, herbivorims is 0 in this region.
3   ALI3             2
4   ALI5             1

我试过这个:

as.data.frame(df1 %>% group_by(Region,Herbivory_type) %>% summarise(N = n()))

但产量不是我所期望的

Region Herbivory_type N
1   ALI1              S 1
2   ALI1           <NA> 3
3   ALI2           <NA> 3
4   ALI3              S 2
5   ALI3           <NA> 2
6   ALI5              S 1
7   ALI5           <NA> 1

有人知道怎么做吗?
先谢了

goucqfw6

goucqfw61#

您可以使用count()按组对!is.na(Herbivory_type)求和,并获得每个区域的非缺失值的数量。

library(dplyr)

df1 %>%
  count(Region, wt = !is.na(Herbivory_type))

# # A tibble: 4 × 2
#   Region   res
#   <chr>  <int>
# 1 ALI1       1
# 2 ALI2       0
# 3 ALI3       2
# 4 ALI5       1
jgovgodb

jgovgodb2#

library(dplyr)
df1 %>% 
    group_by(Region) %>%
    summarise(n_Herbivory_S = sum(Herbivory_type %in% c("S")))

(假设真实的数据集中可能有其他类别需要忽略-否则!is.na()更简单)

qhhrdooz

qhhrdooz3#

您可以计算非NA,即

library(dplyr)

df1 %>% 
 group_by(Region) %>% 
 summarise(res = sum(!is.na(Herbivory_type)))

# A tibble: 4 × 2
  Region   res
  <chr>  <int>
1 ALI1       1
2 ALI2       0
3 ALI3       2
4 ALI5       1

相关问题