R语言 按标识符分组,然后根据开始日期和结束日期计算平均值

ghg1uchk  于 2023-03-20  发布在  其他
关注(0)|答案(1)|浏览(89)

我有一个时间序列数据集,看起来像这样:

我想对ID进行分组并计算开始日期之前、从开始日期到结束日期的时间段内以及结束日期之后的平均值。
任何帮助将不胜感激谢谢!

js81xvg6

js81xvg61#

library(tidyverse)
library(lubridate)

# define sample data
df <- tibble::tribble(
    ~id,    ~month,       ~start,         ~end, ~value,
   "45", "2016-08", "2017-02-01", "2019-03-07",    300,
   "45", "2016-09", "2017-02-01", "2019-03-07",    345,
   "45", "2018-09", "2017-02-01", "2019-03-07",    888,
   "45", "2018-10", "2017-02-01", "2019-03-07",    232,
   "45", "2023-09", "2017-02-01", "2019-03-07",    456,
  "113", "2016-08", "2018-05-01", "2020-07-15",   1253,
  "113", "2016-09", "2018-05-01", "2020-07-15",   3423,
  "113", "2019-01", "2018-05-01", "2020-07-15",   1353,
  "113", "2019-02", "2018-05-01", "2020-07-15",   2362,
  "113", "2023-09", "2018-05-01", "2020-07-15",   2315
  )

# convert to dates
df <- df |> 
  mutate(month = ym(month),
         start = ymd(start),
         end = ymd(end))

# compute means for each period
df |> 
  mutate(period = case_when(
    month %within% interval(start, end) ~ "during",
    month < start ~ "before",
    month > end ~ "after"
  )) |> 
  summarise(
    mean = mean(value),
    .by = c(id, period)
  )
#> # A tibble: 6 × 3
#>   id    period  mean
#>   <chr> <chr>  <dbl>
#> 1 45    before  322.
#> 2 45    during  560 
#> 3 45    after   456 
#> 4 113   before 2338 
#> 5 113   during 1858.
#> 6 113   after  2315

创建于2023年3月16日,使用reprex v2.0.2
或者,从整齐的 Dataframe 开始,并将它们连接起来:

library(tidyverse)
library(lubridate)

# define sample data
df1 <- tibble::tribble(
   ~id,       ~start,         ~end,
  "45", "2017-02-01", "2019-03-07",
 "113", "2018-05-01", "2020-07-15"
)

df2 <- tibble::tribble(
    ~id,    ~month, ~value,
   "45", "2016-08",    300,
   "45", "2016-09",    345,
   "45", "2018-09",    888,
   "45", "2018-10",    232,
   "45", "2023-09",    456,
  "113", "2016-08",   1253,
  "113", "2016-09",   3423,
  "113", "2019-01",   1353,
  "113", "2019-02",   2362,
  "113", "2023-09",   2315
)

# merge month to intervals
during <- df1 |> 
  inner_join(df2, join_by(id, between(y$month, x$start, x$end))) |> 
  summarise(during = mean(value), .by = id)

before <- df1 |> 
  inner_join(df2, join_by(id, start > month)) |> 
  summarise(before = mean(value), .by = id)

after <- df1 |> 
  inner_join(df2, join_by(id, end < month)) |> 
  summarise(after = mean(value), .by = id)

# bind results
before |> 
  inner_join(during, join_by(id)) |> 
  inner_join(after, join_by(id))
#> # A tibble: 2 × 4
#>   id    before during after
#>   <chr>  <dbl>  <dbl> <dbl>
#> 1 45      322.   560    456
#> 2 113    2338   1858.  2315

创建于2023年3月16日,使用reprex v2.0.2

相关问题