我有一个大型数据框,包含不同年份不同地点不同年龄段的个体数量信息。我希望减少数据框,以便只处理每个年龄段至少有15个个体样本的年份,以及至少有2年数据的地点(删除少于15个个体的年份后)。
示例:
library(tidyverse)
set.seed(42)
df <- data.frame(
site = sample(LETTERS[1:5], size = 2000, replace = TRUE),
age = sample(letters[1:3], size = 2000, replace = TRUE),
year = sample(1990:1999, size = 2000, replace = TRUE)
)
# determine the site, age & year combinations with at least 15 individuals
countXyear = count(df, site, age, year) %>% filter(n >= 15)
site age year n
1 A a 1991 16
2 A a 1992 20
3 A a 1996 19
4 A a 1999 20
5 A b 1991 15
6 A b 1996 16
7 A b 1997 15
8 A c 1990 15
9 A c 1993 15
10 A c 1998 19
11 A c 1999 18
12 B a 1990 21
13 B a 1993 16
14 B a 1994 18
15 B a 1995 24
16 B a 1999 16
17 B b 1991 18
18 B b 1992 22
19 B b 1995 18
20 B b 1996 17
21 B b 1998 20
22 B b 1999 23
23 B c 1992 15
24 B c 1994 16
25 B c 1999 16
26 C a 1993 16
27 C a 1997 20
28 C a 1999 15
29 C b 1999 17
30 C c 1991 16
31 C c 1993 19
32 C c 1994 21
33 D a 1990 15
34 D a 1994 20
35 D a 1998 21
36 D b 1990 18
37 D b 1994 17
38 D b 1996 20
39 D b 1997 15
40 D c 1995 16
41 D c 1996 16
42 D c 1997 20
43 D c 1999 16
44 E a 1990 17
45 E a 1996 15
46 E a 1997 16
47 E a 1998 15
48 E b 1990 17
49 E b 1991 16
50 E b 1998 16
51 E b 1999 16
52 E c 1991 16
53 E c 1992 18
54 E c 1998 15
# determine the site & age combinations that were were sampled in at least 2 years (after remvoing the years with fewer than 15 individuals)
countXsite = count(countXyear, site, age) %>% filter(n > 2)
site age n
1 A a 4
2 A b 3
3 A c 4
4 B a 5
5 B b 6
6 B c 3
7 C a 3
8 C b 1
9 C c 3
10 D a 3
11 D b 4
12 D c 4
13 E a 4
14 E b 4
15 E c 3
# filter data to the sites & ages in countXsite and years in countXyears
dfSub <- filter(df,
site == countXsite$site,
age == countXsite$age,
year == countXyear$year)
Warning messages:
1: In site == countXsite$site :
longer object length is not a multiple of shorter object length
2: In age == countXsite$age :
longer object length is not a multiple of shorter object length
3: In year == countXyear$year :
longer object length is not a multiple of shorter object length
此外,生成的 Dataframe 只有9个观察结果,这显然不应该是这种情况。我尝试将过滤器中的“,”替换为“&",但没有解决问题。如何解决此类复杂过滤器问题?
1条答案
按热度按时间j9per5c41#
对不起,我在最初的回答中有错别字,然后重新阅读了这个问题。我认为这得到了你想要的--所有的观察,其中该网站/年龄至少有2年,每个至少有15个观察。
我在这里使用
add_count
,这样我们就可以保持years
列不变,而不会折叠它,同时仍然计算一个站点/年龄至少有15个观测值的年数。