R语言 计算时间序列之间的时间延迟

r3i60tvu  于 2022-12-24  发布在  其他
关注(0)|答案(1)|浏览(283)

我有两个时间序列,表示某个区域入口和出口的累计计数,序列之间的垂直距离可以解释为该区域的瞬时占用率,水平距离是平均“停留”时间,真实的数据的分辨率为1分钟,每分钟的数据都有观测值(样本数据在底部)。
计算一天中两个系列之间的水平时间延迟的简单、可读、有效的方法是什么?例如,在系列2105中的每个点,到系列2081上具有相同y值的(第一个)线性插值点需要多少分钟?

# using df1 defined at bottom
ggplot(df1, aes(datetime, cuml, color = label)) + 
  geom_line() +   
  geom_point()

我考虑过的一种方法是使用dplyr::complete + zoo::na.approx对每个系列进行lerp,然后将每个系列与其他系列进行非等长连接,以获得最接近的匹配,但对于我的用例(约20个传感器,200天@1分钟分辨率)来说,这似乎效率低下且难以操作。

library(dplyr); library(zoo)
df1 %>%
  complete(label, datetime = seq.POSIXt(
    min(df1$datetime), max(df1$datetime), by = "min")) %>%
  group_by(label) %>%
  mutate(cuml2 = na.spline(cuml, datetime)) %>%
  ungroup() -> df2

df2 %>% filter(label == "2105 Line 0-exit") %>%
  left_join(df2 %>% filter(label != "2105 Line 0-exit"), by = character()) %>%
  mutate(dif = cuml2.x - cuml2.y) %>%
  group_by(datetime.x) %>%
  slice_min(abs(dif), n = 1) %>%
  ungroup() %>%
  mutate(time_dif = datetime.y - datetime.x)

示例数据

df1 <- structure(list(datetime = structure(c(1670152500, 1670152500, 
  1670154300, 1670156100, 1670156100, 1670157900, 1670157900, 1670159700, 
  1670159700, 1670161500, 1670161500, 1670163300, 1670163300, 1670165100, 
  1670165100, 1670166900, 1670166900, 1670168700, 1670168700, 1670170500, 
  1670170500, 1670172300, 1670172300, 1670174100, 1670174100, 1670175900, 
  1670175900), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    label = c("2081 Line 0-exit", "2105 Line 0-exit", "2105 Line 0-exit", 
    "2081 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit", 
    "2105 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit", 
    "2105 Line 0-exit", "2081 Line 0-exit", "2081 Line 0-exit", 
    "2105 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit", 
    "2081 Line 0-exit", "2105 Line 0-exit", "2105 Line 0-exit", 
    "2081 Line 0-exit", "2105 Line 0-exit", "2081 Line 0-exit", 
    "2105 Line 0-exit", "2081 Line 0-exit", "2105 Line 0-exit", 
    "2081 Line 0-exit", "2081 Line 0-exit", "2105 Line 0-exit"
    ), cuml = c(8.30121553513193, 96.9773299748111, 244.892247411139, 
    213.756300029647, 418.275958578226, 420.249036466054, 636.719843268962, 
    883.57122865939, 637.118292321376, 1137.27959697733, 891.343018084791, 
    1178.77260598873, 1388.04925832634, 1725.02099076406, 1407.05603320486, 
    1710.05040023718, 2025.74867058494, 2349.00643716765, 2043.13667358435, 
    2668.34592779177, 2346.13104061666, 2935.76826196474, 2649.12540764898, 
    3198.29275118948, 2988.43759264749, 3285.20604802846, 3421.63448082844
    )), row.names = c(NA, -27L), class = c("tbl_df", "tbl", "data.frame"
))
oaxa6hgo

oaxa6hgo1#

将数据读入一个zoo对象z,在label列上进行拆分,这样每个系列都有自己的列,也就是说,z现在有2列,每个序列一个。现在使用approxfun来获取第二列的时间作为第二列值的函数。使用第一列值查找该值,并与第一列的时间求差。求差的平均值。结果以秒为单位。您可能希望根据所需结果的方向来反转差的项。

library(zoo)

z <- read.zoo(df1, split = "label")
mean(as.numeric(time(z)) - approxfun(z[, 2], time(z))(z[, 1]), na.rm = TRUE)
## [1] 1712.595

相关问题