我有一个mydf表,它有带有设备的时间戳列。我想继续合并时间戳,只要两个连续时间戳之间的差异等于或小于30分钟。开始时间戳将标记为开始时间戳,当间隔超过30分钟时,我将结束访问,并将该结束分类为结束时间戳,如下面给出的示例所示
df<-data.frame(customer=rep("XYZ",4),device=rep("x",4),time_stamps=c("2020-05-13 07:50:06","2020-05-13 07:55:06","2020-05-13 08:05:06","2020-05-13 08:50:06"))
df1<-data.frame(customer=rep("XYZ",3),device=rep("y",3),time_stamps=c("2020-05-14 07:50:06","2020-05-14 08:15:06","2020-05-14 08:25:06"))
df2<-data.frame(customer=rep("XYZ",1),device=rep("z",1),time_stamps=c("2020-05-16 09:50:06"))
df3<-data.frame(customer=rep("XYZ",2),device=rep("a",2),time_stamps=c("2020-05-16 09:50:06","2020-05-16 19:50:06"))
df4<-data.frame(customer=rep("XYZ",2),device=rep("b",2),time_stamps=c("2020-05-17 09:50:06","2020-05-17 10:15:06"))
df5<-data.frame(customer=rep("XYZ",4),device=rep("c",4),time_stamps=c("2020-05-13 07:50:06","2020-05-13 07:55:06","2020-05-13 08:05:06","2020-05-13 08:32:06"))
mydf<-rbind(df,df1,df2,df3,df4,df5)
这是我期望的Dataframe
expected_df<-data.frame(customer=rep("XYZ",8),device=c("x","x","y","z","a","a","b","c"),
start_timestamp=c("2020-05-13 07:50:06","2020-05-13 08:50:06","2020-05-14 07:50:06","2020-05-16 09:50:06","2020-05-16 09:50:06","2020-05-16 19:50:06","2020-05-17 09:50:06","2020-05-13 07:50:06"),
end_startstamp=c("2020-05-13 08:05:06","2020-05-13 08:50:06","2020-05-14 08:25:06","2020-05-16 09:50:06","2020-05-16 09:50:06","2020-05-16 19:50:06","2020-05-17 10:15:06","2020-05-13 08:32:06"))
1条答案
按热度按时间hiz5n14c1#
关键是建立我们可以
group_by
. 为此,我们确定了30 * 60
秒,然后使用rle
要整合它们:那么,总结一下:
由reprex软件包(v0.3.0)于2020-06-25创建