我有一个时间序列数据集,看起来像这样:
我想对ID进行分组并计算开始日期之前、从开始日期到结束日期的时间段内以及结束日期之后的平均值。任何帮助将不胜感激谢谢!
js81xvg61#
library(tidyverse) library(lubridate) # define sample data df <- tibble::tribble( ~id, ~month, ~start, ~end, ~value, "45", "2016-08", "2017-02-01", "2019-03-07", 300, "45", "2016-09", "2017-02-01", "2019-03-07", 345, "45", "2018-09", "2017-02-01", "2019-03-07", 888, "45", "2018-10", "2017-02-01", "2019-03-07", 232, "45", "2023-09", "2017-02-01", "2019-03-07", 456, "113", "2016-08", "2018-05-01", "2020-07-15", 1253, "113", "2016-09", "2018-05-01", "2020-07-15", 3423, "113", "2019-01", "2018-05-01", "2020-07-15", 1353, "113", "2019-02", "2018-05-01", "2020-07-15", 2362, "113", "2023-09", "2018-05-01", "2020-07-15", 2315 ) # convert to dates df <- df |> mutate(month = ym(month), start = ymd(start), end = ymd(end)) # compute means for each period df |> mutate(period = case_when( month %within% interval(start, end) ~ "during", month < start ~ "before", month > end ~ "after" )) |> summarise( mean = mean(value), .by = c(id, period) ) #> # A tibble: 6 × 3 #> id period mean #> <chr> <chr> <dbl> #> 1 45 before 322. #> 2 45 during 560 #> 3 45 after 456 #> 4 113 before 2338 #> 5 113 during 1858. #> 6 113 after 2315
创建于2023年3月16日,使用reprex v2.0.2或者,从整齐的 Dataframe 开始,并将它们连接起来:
library(tidyverse) library(lubridate) # define sample data df1 <- tibble::tribble( ~id, ~start, ~end, "45", "2017-02-01", "2019-03-07", "113", "2018-05-01", "2020-07-15" ) df2 <- tibble::tribble( ~id, ~month, ~value, "45", "2016-08", 300, "45", "2016-09", 345, "45", "2018-09", 888, "45", "2018-10", 232, "45", "2023-09", 456, "113", "2016-08", 1253, "113", "2016-09", 3423, "113", "2019-01", 1353, "113", "2019-02", 2362, "113", "2023-09", 2315 ) # merge month to intervals during <- df1 |> inner_join(df2, join_by(id, between(y$month, x$start, x$end))) |> summarise(during = mean(value), .by = id) before <- df1 |> inner_join(df2, join_by(id, start > month)) |> summarise(before = mean(value), .by = id) after <- df1 |> inner_join(df2, join_by(id, end < month)) |> summarise(after = mean(value), .by = id) # bind results before |> inner_join(during, join_by(id)) |> inner_join(after, join_by(id)) #> # A tibble: 2 × 4 #> id before during after #> <chr> <dbl> <dbl> <dbl> #> 1 45 322. 560 456 #> 2 113 2338 1858. 2315
创建于2023年3月16日,使用reprex v2.0.2
1条答案
按热度按时间js81xvg61#
创建于2023年3月16日,使用reprex v2.0.2
或者,从整齐的 Dataframe 开始,并将它们连接起来:
创建于2023年3月16日,使用reprex v2.0.2