我有以下dataframe df(下面的dput
):
> df
group date1 date2 value
1 A 2023-01-04 2023-01-06 1
2 A 2023-01-06 2023-01-07 2
3 A 2023-01-08 2023-01-09 3
4 B 2023-01-05 2023-01-06 3
5 B 2023-01-06 2023-01-08 2
6 B 2023-01-08 2023-01-10 1
我想complete
之间的开始日期2023-01-01
和结束日期2023-01-10
缺失的日期.这意味着对于组A的时间间隔2023-01-01
的date 1到2023-01-04
date 2;缺少2023-01-07
到2023-01-08
和2023-01-09
到2023-01-10
。所需的输出应如下所示:
group date1 date2 value
1 A 2023-01-01 2023-01-04 NA
2 A 2023-01-04 2023-01-06 1
3 A 2023-01-06 2023-01-07 2
4 A 2023-01-07 2023-01-08 NA
5 A 2023-01-08 2023-01-09 3
6 A 2023-01-09 2023-01-10 NA
7 B 2023-01-01 2023-01-05 NA
8 B 2023-01-05 2023-01-06 3
9 B 2023-01-06 2023-01-08 2
10 B 2023-01-08 2023-01-10 1
正如你所看到的,缺失的日期现在用NA值填充,以使序列完整。所以我想知道是否有人知道如何根据每组的开始和结束日期来完成这些日期?dput
df:
structure(list(group = c("A", "A", "A", "B", "B", "B"), date1 = c("2023-01-04",
"2023-01-06", "2023-01-08", "2023-01-05", "2023-01-06", "2023-01-08"
), date2 = c("2023-01-06", "2023-01-07", "2023-01-09", "2023-01-06",
"2023-01-08", "2023-01-10"), value = c(1, 2, 3, 3, 2, 1)), class = "data.frame", row.names = c(NA,
-6L))
3条答案
按热度按时间jm81lzqq1#
试试这个:
(In
dplyr_1.1.0
或更高版本,它可能更喜欢reframe
而不是summarize
。)cs7cruho2#
另一种选择,虽然更长:
给出:
jhiyze9q3#
使用来自ivs的dplyr和
iv_set_complement()
,一个专用于处理间隔的软件包: