R语言 如何根据变量组合的重复次数过滤行[duplicate]

mrphzbgm  于 2022-12-25  发布在  其他
关注(0)|答案(3)|浏览(160)
    • 此问题在此处已有答案**:

Filter based on number of distinct values per group [duplicate](2个答案)
昨天关门了。
我有这样一个数据集:

data <- tibble(year=c(2010,2010,2012,2010,2011,2011,2013,2013,2010,2011,2012,2013),
                  state=c("ca", "ca", "ca", "ny", "ny", "ny", "ny", "ny", "wa", "wa", "wa", "wa"), 
                  variable2=c("a", "b", "c", "b", "c", "a", "d", "a", "b", "b", "c", "b"),
                  value=c(6,5,2,6,3,1,7,8,3,2,5,7))

我将只选择至少具有3个唯一年份的州的数据。在此数据中,这将是ny和wa。我希望保留这些州各自的所有数据。由于变量2,一些州在同一年有多个数据点,但我只对至少具有3个唯一年份的州感兴趣,而不管变量2的值如何。谢谢。

idfiyjo8

idfiyjo81#

你可以试试

library(dplyr)

data %>%
    group_by(state) %>% summarise(n = length(unique(year))) %>%
    filter(n>=3) %>% pull(state)
g6baxovj

g6baxovj2#

试试这个,代码会删除少于三个唯一年份的行。

n<-levels(factor(data$state))

for(i in n){
 data_group<- data[data$state==i,]
 length_year<- length(unique(data_group$year))
 
 if(length_year<3){
 data<- data[!data$state==i, ]
 }
  
}
aiqt4smr

aiqt4smr3#

您可以定义一个长度唯一的函数ulen,并在ave中使用它。

ulen <- \(x) length(unique(x))

data[with(data, ave(year, state, FUN=ulen)) > 2, ]
#    year state variable2 value
# 4  2010    ny         b     6
# 5  2011    ny         c     3
# 6  2011    ny         a     1
# 7  2013    ny         d     7
# 8  2013    ny         a     8
# 9  2010    wa         b     3
# 10 2011    wa         b     2
# 11 2012    wa         c     5
# 12 2013    wa         b     7
  • 数据:*
data <- structure(list(year = c(2010, 2010, 2012, 2010, 2011, 2011, 2013, 
2013, 2010, 2011, 2012, 2013), state = c("ca", "ca", "ca", "ny", 
"ny", "ny", "ny", "ny", "wa", "wa", "wa", "wa"), variable2 = c("a", 
"b", "c", "b", "c", "a", "d", "a", "b", "b", "c", "b"), value = c(6, 
5, 2, 6, 3, 1, 7, 8, 3, 2, 5, 7)), class = "data.frame", row.names = c(NA, 
-12L))

相关问题