使用带有map_if的自定义过滤器(交叉)函数(在带有R的 Dataframe 列表上)

7ivaypg9  于 2023-03-05  发布在  其他
关注(0)|答案(1)|浏览(122)

我试图从一个列表(有一个类似日期的字符列)中对几个 Dataframe 进行过滤。
我知道我所有的df都有n个(这里是10个)实体,然后我试图找出long_df格式,其中每行都是一对实体-日期。
然后我想在这些长df上应用一个过滤函数,找到是日期的字符列,只保留一年。
我听说filter_if已经过时了,所以我尝试过跨语法使用它,但到目前为止失败了。
你知道为什么它不工作吗?

## a list of df 

list_df <- list(A = data.frame(ID = letters[1:10],
                               Var1 = rnorm(10),
                               Var2 = rnorm(10),
                               Var3 = rnorm(10)),
                B = data.frame(ID = rep(letters[1:10],3),
                               X = c(rep("01/01/2018", 10), 
                                     rep("01/01/2019", 10),
                                     rep("01/01/2020", 10)),
                               Var1 = rnorm(30),
                               Var2 = rnorm(30)),
                C = data.frame(ID = rep(letters[1:10],2),
                               D = c(rep("01/01/2018", 10), 
                                     rep("01/01/2019", 10)),
                               Var1 = rnorm(20),
                               Var2 = rnorm(20)))

## a custom function to find character column that are date (= B$X & C$D)

guessdate <- function(x) !all(is.na(as.Date(as.character(x),format="%d/%m/%Y"))) 

#test the function on one df
list_df[["B"]] %>% map(., guessdate)

## what i've tried so far
list_df %>% map_if(.p = ~ nrow(.x) > 10,  # apply function only on dataframe with more than 10 rows
                          ~ filter(across(where(map(., guessdate)), ~ str_detect(.x, "2018")))) ## filter the date-like column keeping only (2018) 

## desired ouput

output <- list(A = data.frame(ID = letters[1:10],
                               Var1 = rnorm(10),
                               Var2 = rnorm(10),
                               Var3 = rnorm(10)),
                B = data.frame(ID = letters[1:10],
                               X = c(rep("01/01/2018", 10)),
                               Var1 = rnorm(10),
                               Var2 = rnorm(10)),
                C = data.frame(ID = letters[1:10],
                               D = c(rep("01/01/2018", 10)),
                               Var1 = rnorm(10),
                               Var2 = rnorm(10)))
7vux5j2d

7vux5j2d1#

如果看起来你有一个额外的map在那里,并没有传递数据.帧过滤器.尝试

list_df %>% map_if(.p = ~ nrow(.x) > 10,
        ~ filter(.x, if_all(where(guessdate), ~ str_detect(.x, "2018"))))

filter()中使用across()在dplyr 1.0.8中已被弃用,因此我们在此处使用if_all()

相关问题