R语言 如何按日期聚合变量

e5njpo68  于 2023-11-14  发布在  其他
关注(0)|答案(2)|浏览(136)

我有一个带有列的R数据集:

  • 第一个月
  • country_code
  • version
  • money

变量money是数值型的。变量versioncountry_code是分类型的。datecountry_codeversion的每个组合都定义了一个唯一的观测值,money的值将被测量。
我想对变量versioncountry_code进行分组,并计算数据集中的3个新变量,它们是:

  • money_week:对于每个观察,此变量存储前7天沿着的money的聚合。
  • money_15:对于每个观测,此变量存储沿着过去15天的money聚合
  • money_total:对于每个观察,此变量存储money到当前日期的聚合。

country_codeversion的不同组合可能具有不同的开始和结束日期,并且也可能缺少日期。

示例

data = tibble(
date = seq(as.Date('2022-01-01'), as.Date('2022-01-13'), by=1),
       money = 1:13)
data = data[-c(2,3,4,10,11),]

# [Some computations for adding the expected new variables]

# A tibble: 10 × 2
  date       money    money_week
2022-01-01     1          1
2022-01-05     5          6
2022-01-06     6          12
2022-01-07     7          19
2022-01-08     8          26
2022-01-09     9          35        
2022-01-12    12          42
2022-01-13    13          49

字符串
注意,每个观测值不只是前7个观测值的总和,而是前7天的总和。例如,日期2022-01-08的观测值是日期2022-01-082022-01-072022-01-062022-01-05的观测值的总和,它将包括观测值2022-01-042022-01-032022-01-02但最后一个不见了。

u5rb5r59

u5rb5r591#

您可以使用重叠联接来合并从日期到日期间隔(分别为7天或15天)的值:

library(tidyverse)

# define sample data
data = tibble(
  date = seq(as.Date('2022-01-01'), as.Date('2022-01-13'), by=1),
  money = 1:13)

data = data[-c(2,3,4,10,11),]

# prepare intervals for overlap joins
df <- data |> 
  mutate(
    date = ymd(date),
    date_7 = date - days(6),
    date_15 = date - days(14)
  )

# merge by 7-day interval
sum_7 <- df |> 
  select(date, date_7) |> 
  inner_join(df, join_by(between(y$date, x$date_7, x$date))) |> 
  summarize(money = sum(money), .by = date.x) |> 
  rename(date = date.x, money_7 = money)

# merge by 15-day interval
sum_15 <- df |> 
  select(date, date_15) |> 
  inner_join(df, join_by(between(y$date, x$date_15, x$date))) |> 
  summarize(money = sum(money), .by = date.x) |> 
  rename(date = date.x, money_15 = money)

# bind cols
data |> 
  inner_join(sum_7, join_by(date)) |> 
  inner_join(sum_15, join_by(date))
#> # A tibble: 8 × 4
#>   date       money money_7 money_15
#>   <date>     <int>   <int>    <int>
#> 1 2022-01-01     1       1        1
#> 2 2022-01-05     5       6        6
#> 3 2022-01-06     6      12       12
#> 4 2022-01-07     7      19       19
#> 5 2022-01-08     8      26       27
#> 6 2022-01-09     9      35       36
#> 7 2022-01-12    12      42       48
#> 8 2022-01-13    13      49       61

字符串
创建于2023-11-12使用reprex v2.0.2

dkqlctbz

dkqlctbz2#

您可以使用date变量作为滑动窗口函数的索引。

library(slider)
library(tidyverse)

data %>%
  mutate(money_week = slide_index_dbl(money, date, sum, .before = days(6)))
#> # A tibble: 8 × 3
#>   date       money money_week
#>   <date>     <int>      <dbl>
#> 1 2022-01-01     1          1
#> 2 2022-01-05     5          6
#> 3 2022-01-06     6         12
#> 4 2022-01-07     7         19
#> 5 2022-01-08     8         26
#> 6 2022-01-09     9         35
#> 7 2022-01-12    12         42
#> 8 2022-01-13    13         49

字符串

更多测试数据展示money_15

set.seed(123)
newdata = tibble(
  date = seq(as.Date('2022-01-01'), as.Date('2022-03-31'), by=1),
  money = sample(1:20,90, replace = TRUE)) %>%
  slice_sample(n = 60) %>%
  arrange(date)

newdata %>%
  mutate(money_week = slide_index_dbl(money, date, sum, .before = days(6)),
         money_15 = slide_index_dbl(money, date, sum, .before = days(14)),
         money_total = cumsum(money))
#> # A tibble: 60 × 5
#>    date       money money_week money_15 money_total
#>    <date>     <int>      <dbl>    <dbl>       <int>
#>  1 2022-01-01    15         15       15          15
#>  2 2022-01-02    19         34       34          34
#>  3 2022-01-03    14         48       48          48
#>  4 2022-01-05    10         58       58          58
#>  5 2022-01-06    18         76       76          76
#>  6 2022-01-07    11         87       87          87
#>  7 2022-01-08     5         77       92          92
#>  8 2022-01-09    20         78      112         112
#>  9 2022-01-11     5         69      117         117
#> 10 2022-01-13     9         50      126         126
#> # ℹ 50 more rows

相关问题