如何在R中执行固定日期范围的过滤连接?

brccelvz  于 2023-05-20  发布在  其他
关注(0)|答案(2)|浏览(162)

我正在尝试使用dplyr来执行一个固定日期范围的过滤连接。2天。请参见下面的示例:

library(tibble)

#Create tibble1
patient_id <- c("patient1", "patient2")
laterality <- c("L", "R")
date <- as.Date(c("2020-10-24", "2010-09-24"))
tibble1 <- tibble(patient_id, laterality, date)

#Create tibble2
patient_id <- c("patient1", "patient2", "patient1", "patient1")
laterality <- c("L", "R", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2010-09-18", "2020-10-25"))
type <- c("dark", "light", "dark", "light")
tibble2 <- tibble(patient_id, laterality, date, type)

#Create output
patient_id <- c("patient1", "patient2", "patient1")
laterality <- c("L", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2020-10-25"))
type <- c("dark", "light", "light")
output <- tibble(patient_id, laterality, date, type)

我想使用tibble1过滤tibble2,固定日期范围为+/- 2天,这应该会给予output。我试过使用semi_join,但不确定如何合并日期范围。

semi_join(tibble2,tibble1)

我见过其他的解决方案,它们有单独的fromto列来定义范围,但我希望用一个固定的范围来定义所有行。
谢谢大家!

odopli94

odopli941#

使用mutate为每一行创建开始/结束值,然后连接数据

library(dplyr)
tibble2 %>% 
  mutate(start=date-2, end=date+2, date=NULL) %>% 
  right_join(tibble1, join_by(patient_id, laterality, between(y$date, x$start, x$end))) %>% 
  select(-start, -end)
#   patient_id laterality type  date      
#   <chr>      <chr>      <chr> <date>    
# 1 patient1   L          dark  2020-10-24
# 2 patient2   R          light 2010-09-24
# 3 patient1   L          light 2020-10-24
tjrkku2a

tjrkku2a2#

我假设您希望由tibble1中的最大和最小日期指定范围?
你可以使用dplyr::filter()和dplyr::between()来实现这样的结果:

filter(tibble2, between(date,
                        min(tibble1$date) - 2, 
                        max(tibble1$date) + 2))

相关问题