R语言 加入日期更改为数据表中的某个关键字

px9o7tmv  于 2022-12-20  发布在  其他
关注(0)|答案(1)|浏览(105)

我有以下 Dataframe df和df_dates(dput如下):

> df
   group      start        end
1      A 2022-12-01 2022-12-04
2      A 2022-12-04 2022-12-07
3      A 2022-12-07 2022-12-10
4      A 2022-12-10 2022-12-13
5      A 2022-12-13 2022-12-16
6      A 2022-12-16 2022-12-19
7      B 2022-12-01 2022-12-04
8      B 2022-12-04 2022-12-07
9      B 2022-12-07 2022-12-10
10     B 2022-12-10 2022-12-13
11     B 2022-12-13 2022-12-16
12     B 2022-12-16 2022-12-19
> df_dates
  group       date value
1     A 2022-12-02     1
2     A 2022-12-14     3
3     B 2022-12-06     2
4     B 2022-12-13     4

我想按组连接df_dates的行,其中date列位于df的列开始和列结束之间。当我连接这两个 Dataframe 时,date列的日期返回与df的开始列的日期相同的日期。以下是代码和输出:

df <- data.frame(group = rep(c('A', 'B'), each = 6),
                 start = c(seq.Date(as.Date('2022-12-01'), as.Date('2022-12-16'), '3 days')),
                 end = c(seq.Date(as.Date('2022-12-04'), as.Date('2022-12-19'), '3 days')))
df_dates <- data.frame(group = c('A', 'A', 'B', 'B'),
                       date = as.Date(c('2022-12-02', '2022-12-14', '2022-12-06', '2022-12-13')),
                       value = c(1,3,2,4))
library(data.table)
setDT(df)
setDT(df_dates)
df_dates[df, 
         .(group, date, start, end, value), 
         on = .(group, date >= start, date <= end)]
#>     group       date      start        end value
#>  1:     A 2022-12-01 2022-12-01 2022-12-04     1
#>  2:     A 2022-12-04 2022-12-04 2022-12-07    NA
#>  3:     A 2022-12-07 2022-12-07 2022-12-10    NA
#>  4:     A 2022-12-10 2022-12-10 2022-12-13    NA
#>  5:     A 2022-12-13 2022-12-13 2022-12-16     3
#>  6:     A 2022-12-16 2022-12-16 2022-12-19    NA
#>  7:     B 2022-12-01 2022-12-01 2022-12-04    NA
#>  8:     B 2022-12-04 2022-12-04 2022-12-07     2
#>  9:     B 2022-12-07 2022-12-07 2022-12-10    NA
#> 10:     B 2022-12-10 2022-12-10 2022-12-13     4
#> 11:     B 2022-12-13 2022-12-13 2022-12-16     4
#> 12:     B 2022-12-16 2022-12-16 2022-12-19    NA

创建于2022年12月12日,使用reprex v2.0.2
正如您所看到的,date列的日期现在被更改为start列的日期,而我希望它们与df_dates Dataframe 中的日期相同。

#>     group       date      start        end value
#>  1:     A 2022-12-02 2022-12-01 2022-12-04     1
#>  2:     A         NA 2022-12-04 2022-12-07    NA
#>  3:     A         NA 2022-12-07 2022-12-10    NA
#>  4:     A         NA 2022-12-10 2022-12-13    NA
#>  5:     A 2022-12-14 2022-12-13 2022-12-16     3
#>  6:     A         NA 2022-12-16 2022-12-19    NA
#>  7:     B         NA 2022-12-01 2022-12-04    NA
#>  8:     B 2022-12-06 2022-12-04 2022-12-07     2
#>  9:     B         NA 2022-12-07 2022-12-10    NA
#> 10:     B 2022-12-13 2022-12-10 2022-12-13     4
#> 11:     B 2022-12-13 2022-12-13 2022-12-16     4
#> 12:     B         NA 2022-12-16 2022-12-19    NA

所以我想知道是否有人知道如何使用data.table以正确的方式连接这两个 Dataframe ?
df和df_日期的dput

df <- structure(list(group = c("A", "A", "A", "A", "A", "A", "B", "B", 
"B", "B", "B", "B"), start = structure(c(19327, 19330, 19333, 
19336, 19339, 19342, 19327, 19330, 19333, 19336, 19339, 19342
), class = "Date"), end = structure(c(19330, 19333, 19336, 19339, 
19342, 19345, 19330, 19333, 19336, 19339, 19342, 19345), class = "Date")), class = "data.frame", row.names = c(NA, 
-12L))

df_dates <- structure(list(group = c("A", "A", "B", "B"), date = structure(c(19328, 
19340, 19332, 19339), class = "Date"), value = c(1, 3, 2, 4)), class = "data.frame", row.names = c(NA, 
-4L))
yeotifhr

yeotifhr1#

library(data.table)
setDT(df)
setDT(df_dates)

只需指定您想要原始的date(您可以使用前缀x.i.(在本例中引用df))

df_dates[df, 
         .(group, x.date, start, end, value), 
         on = .(group, date >= start, date <= end)]

或修改原件:

df[, c("date", "value") := 
       df_dates[.SD, on = .(group, date >= start, date <= end), .(x.date, value)]]
#      group      start        end       date value
#     <char>     <Date>     <Date>     <Date> <num>
#  1:      A 2022-12-01 2022-12-04 2022-12-02     1
#  2:      A 2022-12-04 2022-12-07       <NA>    NA
#  3:      A 2022-12-07 2022-12-10       <NA>    NA
#  4:      A 2022-12-10 2022-12-13       <NA>    NA
#  5:      A 2022-12-13 2022-12-16 2022-12-14     3
#  6:      A 2022-12-16 2022-12-19       <NA>    NA
#  7:      B 2022-12-01 2022-12-04       <NA>    NA
#  8:      B 2022-12-04 2022-12-07 2022-12-06     2
#  9:      B 2022-12-07 2022-12-10       <NA>    NA
# 10:      B 2022-12-10 2022-12-13 2022-12-13     4
# 11:      B 2022-12-13 2022-12-16 2022-12-13     4
# 12:      B 2022-12-16 2022-12-19       <NA>    NA

相关问题