在R中,如何按组(id)计算两列日期之间的差异,同时保留第一个可用日期作为参考

dgenwo3n  于 2022-12-06  发布在  其他
关注(0)|答案(1)|浏览(112)

如何计算两列日期之间的时间,但保持第一个或最早的日期作为参考,按组。例如idN02,参考日期_1应保持2009-07-10,直到下一个id。我认为我是接近,但我不能成功地找到正确的解决方案。
请在下面找到一个最小工作示例:

id <- c("N02", "N02", "N03", "N03", "N04", "N04", "N04", "N04", "N04", "N04")
date_1 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
date_2 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
df1 <- data.frame (id, date_1, date_2)
> df1
    id     date_1     date_2
1  N02 2008-03-15 2008-03-15
2  N02 2008-04-15 2008-04-15
3  N03 2008-06-15 2008-06-15
4  N03 2008-07-15 2008-07-15
5  N04 2009-07-10 2009-07-10
6  N04 2009-07-13 2009-07-13
7  N04 2009-07-15 2009-07-15
8  N04 2009-07-16 2009-07-16
9  N04 2009-07-17 2009-07-17
10 N04 2009-07-20 2009-07-20

我失败的尝试:

df2 <- df1 %>% group_by (id) %>% mutate (diff = difftime (date_2, lag (date_1, default = date_1[1]), unit = "day"))
> df2
# A tibble: 10 × 4
# Groups:   id [3]
   id    date_1     date_2     diff         
   <chr> <chr>      <chr>      <drtn>       
 1 N02   2008-03-15 2008-03-15  0.00000 days
 2 N02   2008-04-15 2008-04-15 30.95833 days
 3 N03   2008-06-15 2008-06-15  0.00000 days
 4 N03   2008-07-15 2008-07-15 30.00000 days
 5 N04   2009-07-10 2009-07-10  0.00000 days
 6 N04   2009-07-13 2009-07-13  3.00000 days
 7 N04   2009-07-15 2009-07-15  2.00000 days
 8 N04   2009-07-16 2009-07-16  1.00000 days
 9 N04   2009-07-17 2009-07-17  1.00000 days
10 N04   2009-07-20 2009-07-20  3.00000 days

不过我想这样的东西:

id <- c("N02", "N02", "N03", "N03", "N04", "N04", "N04", "N04", "N04", "N04")
date_1 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
date_2 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
diff <- c("0.00000 days", "30.95833 days", "0.00000 days", "30.00000 days", "0.00000 days", "3.00000 days", "5.00000 days", "6.00000 days", "7.00000 days", "10.0000 days")
df2 <- data.frame (id, date_1, date_2, diff)
> df2
    id     date_1     date_2          diff
1  N02 2008-03-15 2008-03-15  0.00000 days
2  N02 2008-04-15 2008-04-15 30.95833 days
3  N03 2008-06-15 2008-06-15  0.00000 days
4  N03 2008-07-15 2008-07-15 30.00000 days
5  N04 2009-07-10 2009-07-10  0.00000 days
6  N04 2009-07-13 2009-07-13  3.00000 days
7  N04 2009-07-15 2009-07-15  5.00000 days
8  N04 2009-07-16 2009-07-16  6.00000 days
9  N04 2009-07-17 2009-07-17  7.00000 days
10 N04 2009-07-20 2009-07-20  10.0000 days

提前感谢你的帮助。查尔斯

elcex8rz

elcex8rz1#

您几乎已经做到了-只需使用[[1]](或dplyr::first())而不是lag()
第一个

相关问题