在R中,在开始日期和结束日期之间按组连接数据框

6tqwzwtp  于 2023-06-19  发布在  其他
关注(0)|答案(2)|浏览(119)

我有两个 Dataframe :

  1. df1包含三列:开始日期(date_start)、结束日期(date_end)和严重度分类(category
  2. df2有一列日期(date),该列日期在2020年1月提前一天
    如何将df1连接到df2,使df1中的category填充在date_startdate_end之间?
    示例数据:
df1 <- data.frame(matrix(ncol = 3, nrow = 2))
colnames(df1)[1:3] <- c('date_start','date_end','category')
df1$date_start <- c(as.Date('2020-01-02'),
                    as.Date('2020-01-10'))
df1$date_end <- c(as.Date('2020-01-05'),
                  as.Date('2020-01-13'))
df1$category <- c('moderate','strong')

df2 <- data.frame(matrix(ncol = 1, nrow = 15))
colnames(df2)[1] <- 'date'
df2$date <- seq.Date(as.Date('2020-01-01'),
                     as.Date('2020-01-15'),
                     1)

df1
df2

理想的数据框看起来像这样:

date category
1  2020-01-01       NA
2  2020-01-02 moderate
3  2020-01-03 moderate
4  2020-01-04 moderate
5  2020-01-05 moderate
6  2020-01-06       NA
7  2020-01-07       NA
8  2020-01-08       NA
9  2020-01-09       NA
10 2020-01-10   strong
11 2020-01-11   strong
12 2020-01-12   strong
13 2020-01-13   strong
14 2020-01-14       NA
15 2020-01-15       NA
omtl5h9j

omtl5h9j1#

使用join_by

library(dplyr)
left_join(df2, df1, by = join_by(between(date, date_start, date_end))) %>% 
  select(date, category) %>% head(n = 15)

#          date category
# 1  2020-01-01     <NA>
# 2  2020-01-02 moderate
# 3  2020-01-03 moderate
# 4  2020-01-04 moderate
# 5  2020-01-05 moderate
# 6  2020-01-06     <NA>
# 7  2020-01-07     <NA>
# 8  2020-01-08     <NA>
# 9  2020-01-09     <NA>
# 10 2020-01-10   strong
# 11 2020-01-11   strong
# 12 2020-01-12   strong
# 13 2020-01-13   strong
# 14 2020-01-14     <NA>
# 15 2020-01-15     <NA>
cqoc49vn

cqoc49vn2#

或者,使用sqldf包:

library(sqldf)
sqldf("SELECT df2.date, df1.category 
       FROM df2
       LEFT JOIN df1 
       ON df2.date BETWEEN df1.date_start AND df1.date_end")

相关问题