R:剪切 Dataframe ,但确保所有步骤

sxpgvts3  于 2022-12-20  发布在  其他
关注(0)|答案(1)|浏览(95)

假设我有以下数据:

test_data <- dplyr::tibble(
  ID = c(1, 1, 1, 1, 1, 1, 1),
  values = c(40, 41, 38, 36, 35, 36, 30),
  times = c(as.POSIXct("2020-01-01 00:00:00"),
            as.POSIXct("2020-01-01 15:00:00"),
            as.POSIXct("2020-01-01 18:00:00"),
            as.POSIXct("2020-01-02 14:00:00"),
            as.POSIXct("2020-01-03 20:00:00"),
            as.POSIXct("2020-01-05 10:00:00"),
            as.POSIXct("2020-01-05 14:00:00")))

现在我想从第一个时间步开始提取每天的最后一个值,为此我做了以下操作:

test_data %>%
  dplyr::mutate(diff = as.double.difftime(times - min(times), units = "days")) %>%
  dplyr::mutate(day = cut(diff, breaks = 0:6, include.lowest = TRUE, right = TRUE, ordered_result = TRUE)) %>%
  group_by(ID, day) %>%
  filter(row_number()==n()) %>%
  select(ID, day, values) %>%
  tidyr::pivot_wider(names_from = day, values_from = values)

其给出:

ID `[0,1]` `(1,2]` `(2,3]` `(4,5]`
  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1     1      38      36      35      30

但是,正如您所看到的,由于我们没有第3天到第4天的数据,因此缺少了一个步骤。是否有一种方法可以确保所有间隔都包含在结果中,而对于缺少的数据则放置NA?我唯一的想法是向包含所有间隔数据的 Dataframe 添加一个“虚拟用户”,以便确保包含所有间隔。因此,我想要的是:

ID `[0,1]` `(1,2]` `(2,3]` `(3,4]` `(4,5]`
  <dbl>   <dbl>   <dbl>   <dbl> <dbl>  <dbl>
1     1      38      36      35   NA   30
2ic8powd

2ic8powd1#

您可以在数据集中查找缺失的日期来填充缺失的行,如下所示:

seq_dates <- tibble(times = seq(min(unique(as.Date(test_data$times))), max(unique(as.Date(test_data$times))), by="days"))
missing_dates <- seq_dates %>% filter(!times %in% unique(as.Date(test_data$times)))
missing_dates$times <- as.POSIXct(missing_dates$times)
missing_dates$ID <- 1
missing_dates$values <- NA
missing_dates <- missing_dates %>% select(ID, values, times)
test_data <- test_data %>% bind_rows(missing_dates) %>% arrange(times)

然后执行代码:

test_data %>%
dplyr::mutate(diff = as.double.difftime(times - min(times), units = "days")) %>%
dplyr::mutate(day = cut(diff, breaks = 0:6, include.lowest = TRUE, right = TRUE, ordered_result = TRUE)) %>%
group_by(ID, day) %>%
filter(row_number()==n()) %>%
select(ID, day, values) %>%
tidyr::pivot_wider(names_from = day, values_from = values)

并获得所需的结果:

ID `[0,1]` `(1,2]` `(2,3]` `(3,4]` `(4,5]`
 <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1     1      38      36      35      NA      30

相关问题