dplyr过滤器在不同条件下跨多个色谱柱

5t7ly7z5  于 2023-03-10  发布在  其他
关注(0)|答案(1)|浏览(133)

我有一个大型数据框,包含不同年份不同地点不同年龄段的个体数量信息。我希望减少数据框,以便只处理每个年龄段至少有15个个体样本的年份,以及至少有2年数据的地点(删除少于15个个体的年份后)。
示例:

library(tidyverse)
set.seed(42)
df <- data.frame(
  site = sample(LETTERS[1:5], size = 2000, replace = TRUE),
  age  = sample(letters[1:3], size = 2000, replace = TRUE),
  year = sample(1990:1999, size = 2000, replace = TRUE)
)

# determine the site, age & year combinations with at least 15 individuals
countXyear = count(df, site, age, year) %>% filter(n >= 15)
   site age year  n
1     A   a 1991 16
2     A   a 1992 20
3     A   a 1996 19
4     A   a 1999 20
5     A   b 1991 15
6     A   b 1996 16
7     A   b 1997 15
8     A   c 1990 15
9     A   c 1993 15
10    A   c 1998 19
11    A   c 1999 18
12    B   a 1990 21
13    B   a 1993 16
14    B   a 1994 18
15    B   a 1995 24
16    B   a 1999 16
17    B   b 1991 18
18    B   b 1992 22
19    B   b 1995 18
20    B   b 1996 17
21    B   b 1998 20
22    B   b 1999 23
23    B   c 1992 15
24    B   c 1994 16
25    B   c 1999 16
26    C   a 1993 16
27    C   a 1997 20
28    C   a 1999 15
29    C   b 1999 17
30    C   c 1991 16
31    C   c 1993 19
32    C   c 1994 21
33    D   a 1990 15
34    D   a 1994 20
35    D   a 1998 21
36    D   b 1990 18
37    D   b 1994 17
38    D   b 1996 20
39    D   b 1997 15
40    D   c 1995 16
41    D   c 1996 16
42    D   c 1997 20
43    D   c 1999 16
44    E   a 1990 17
45    E   a 1996 15
46    E   a 1997 16
47    E   a 1998 15
48    E   b 1990 17
49    E   b 1991 16
50    E   b 1998 16
51    E   b 1999 16
52    E   c 1991 16
53    E   c 1992 18
54    E   c 1998 15

# determine the site & age combinations that were were sampled in at least 2 years (after remvoing the years with fewer than 15 individuals)
countXsite = count(countXyear, site, age) %>% filter(n > 2)
   site age n
1     A   a 4
2     A   b 3
3     A   c 4
4     B   a 5
5     B   b 6
6     B   c 3
7     C   a 3
8     C   b 1
9     C   c 3
10    D   a 3
11    D   b 4
12    D   c 4
13    E   a 4
14    E   b 4
15    E   c 3

# filter data to the sites & ages in countXsite and years in countXyears
dfSub <- filter(df,
                site == countXsite$site,
                age  == countXsite$age,
                year == countXyear$year)
Warning messages:
1: In site == countXsite$site :
  longer object length is not a multiple of shorter object length
2: In age == countXsite$age :
  longer object length is not a multiple of shorter object length
3: In year == countXyear$year :
  longer object length is not a multiple of shorter object length

此外,生成的 Dataframe 只有9个观察结果,这显然不应该是这种情况。我尝试将过滤器中的“,”替换为“&",但没有解决问题。如何解决此类复杂过滤器问题?

j9per5c4

j9per5c41#

对不起,我在最初的回答中有错别字,然后重新阅读了这个问题。我认为这得到了你想要的--所有的观察,其中该网站/年龄至少有2年,每个至少有15个观察。

df %>%
  count(site, age, year) %>%
  filter(n >= 15) %>%
  add_count(site, age, name = "years_avail") %>%
  filter(years_avail >= 2) %>%
  select(-n, -years_avail) %>%    # optionally drop these fields
  left_join(df, multiple = "all") # option added in dplyr 1.1.0 to
                 # allow multiple matches w/o warning, since we want
                 # to bring in all the original observations that
                 # correspond to the approprite site/age/year combos

我在这里使用add_count,这样我们就可以保持years列不变,而不会折叠它,同时仍然计算一个站点/年龄至少有15个观测值的年数。

相关问题