我有两个时间序列,表示某个区域入口和出口的累计计数,序列之间的垂直距离可以解释为该区域的瞬时占用率,水平距离是平均“停留”时间,真实的数据的分辨率为1分钟,每分钟的数据都有观测值(样本数据在底部)。
计算一天中两个系列之间的水平时间延迟的简单、可读、有效的方法是什么?例如,在系列2105中的每个点,到系列2081上具有相同y值的(第一个)线性插值点需要多少分钟?
# using df1 defined at bottom
ggplot(df1, aes(datetime, cuml, color = label)) +
geom_line() +
geom_point()
我考虑过的一种方法是使用dplyr::complete
+ zoo::na.approx
对每个系列进行lerp,然后将每个系列与其他系列进行非等长连接,以获得最接近的匹配,但对于我的用例(约20个传感器,200天@1分钟分辨率)来说,这似乎效率低下且难以操作。
library(dplyr); library(zoo)
df1 %>%
complete(label, datetime = seq.POSIXt(
min(df1$datetime), max(df1$datetime), by = "min")) %>%
group_by(label) %>%
mutate(cuml2 = na.spline(cuml, datetime)) %>%
ungroup() -> df2
df2 %>% filter(label == "2105 Line 0-exit") %>%
left_join(df2 %>% filter(label != "2105 Line 0-exit"), by = character()) %>%
mutate(dif = cuml2.x - cuml2.y) %>%
group_by(datetime.x) %>%
slice_min(abs(dif), n = 1) %>%
ungroup() %>%
mutate(time_dif = datetime.y - datetime.x)
示例数据
df1 <- structure(list(datetime = structure(c(1670152500, 1670152500,
1670154300, 1670156100, 1670156100, 1670157900, 1670157900, 1670159700,
1670159700, 1670161500, 1670161500, 1670163300, 1670163300, 1670165100,
1670165100, 1670166900, 1670166900, 1670168700, 1670168700, 1670170500,
1670170500, 1670172300, 1670172300, 1670174100, 1670174100, 1670175900,
1670175900), tzone = "UTC", class = c("POSIXct", "POSIXt")),
label = c("2081 Line 0-exit", "2105 Line 0-exit", "2105 Line 0-exit",
"2081 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit",
"2105 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit",
"2105 Line 0-exit", "2081 Line 0-exit", "2081 Line 0-exit",
"2105 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit",
"2081 Line 0-exit", "2105 Line 0-exit", "2105 Line 0-exit",
"2081 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit",
"2105 Line 0-exit", "2081 Line 0-exit", "2105 Line 0-exit",
"2081 Line 0-exit", "2081 Line 0-exit", "2105 Line 0-exit"
), cuml = c(8.30121553513193, 96.9773299748111, 244.892247411139,
213.756300029647, 418.275958578226, 420.249036466054, 636.719843268962,
883.57122865939, 637.118292321376, 1137.27959697733, 891.343018084791,
1178.77260598873, 1388.04925832634, 1725.02099076406, 1407.05603320486,
1710.05040023718, 2025.74867058494, 2349.00643716765, 2043.13667358435,
2668.34592779177, 2346.13104061666, 2935.76826196474, 2649.12540764898,
3198.29275118948, 2988.43759264749, 3285.20604802846, 3421.63448082844
)), row.names = c(NA, -27L), class = c("tbl_df", "tbl", "data.frame"
))
1条答案
按热度按时间oaxa6hgo1#
将数据读入一个zoo对象z,在label列上进行拆分,这样每个系列都有自己的列,也就是说,z现在有2列,每个序列一个。现在使用approxfun来获取第二列的时间作为第二列值的函数。使用第一列值查找该值,并与第一列的时间求差。求差的平均值。结果以秒为单位。您可能希望根据所需结果的方向来反转差的项。