R：剪切 Dataframe ，但确保所有步骤

sxpgvts3 于 2022-12-20 发布在其他

关注(0)|答案(1)|浏览(95)

假设我有以下数据：

test_data <- dplyr::tibble(
  ID = c(1, 1, 1, 1, 1, 1, 1),
  values = c(40, 41, 38, 36, 35, 36, 30),
  times = c(as.POSIXct("2020-01-01 00:00:00"),
            as.POSIXct("2020-01-01 15:00:00"),
            as.POSIXct("2020-01-01 18:00:00"),
            as.POSIXct("2020-01-02 14:00:00"),
            as.POSIXct("2020-01-03 20:00:00"),
            as.POSIXct("2020-01-05 10:00:00"),
            as.POSIXct("2020-01-05 14:00:00")))

现在我想从第一个时间步开始提取每天的最后一个值，为此我做了以下操作：

test_data %>%
  dplyr::mutate(diff = as.double.difftime(times - min(times), units = "days")) %>%
  dplyr::mutate(day = cut(diff, breaks = 0:6, include.lowest = TRUE, right = TRUE, ordered_result = TRUE)) %>%
  group_by(ID, day) %>%
  filter(row_number()==n()) %>%
  select(ID, day, values) %>%
  tidyr::pivot_wider(names_from = day, values_from = values)

其给出：

ID `[0,1]` `(1,2]` `(2,3]` `(4,5]`
  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1     1      38      36      35      30

但是，正如您所看到的，由于我们没有第3天到第4天的数据，因此缺少了一个步骤。是否有一种方法可以确保所有间隔都包含在结果中，而对于缺少的数据则放置NA？我唯一的想法是向包含所有间隔数据的 Dataframe 添加一个“虚拟用户”，以便确保包含所有间隔。因此，我想要的是：

ID `[0,1]` `(1,2]` `(2,3]` `(3,4]` `(4,5]`
  <dbl>   <dbl>   <dbl>   <dbl> <dbl>  <dbl>
1     1      38      36      35   NA   30

r

来源：https://stackoverflow.com/questions/74809363/r-cut-dataframe-but-ensure-all-steps

1条答案

按热度按时间

2ic8powd1#

您可以在数据集中查找缺失的日期来填充缺失的行，如下所示：

seq_dates <- tibble(times = seq(min(unique(as.Date(test_data$times))), max(unique(as.Date(test_data$times))), by="days"))
missing_dates <- seq_dates %>% filter(!times %in% unique(as.Date(test_data$times)))
missing_dates$times <- as.POSIXct(missing_dates$times)
missing_dates$ID <- 1
missing_dates$values <- NA
missing_dates <- missing_dates %>% select(ID, values, times)
test_data <- test_data %>% bind_rows(missing_dates) %>% arrange(times)

然后执行代码：

test_data %>%
dplyr::mutate(diff = as.double.difftime(times - min(times), units = "days")) %>%
dplyr::mutate(day = cut(diff, breaks = 0:6, include.lowest = TRUE, right = TRUE, ordered_result = TRUE)) %>%
group_by(ID, day) %>%
filter(row_number()==n()) %>%
select(ID, day, values) %>%
tidyr::pivot_wider(names_from = day, values_from = values)

并获得所需的结果：

ID `[0,1]` `(1,2]` `(2,3]` `(3,4]` `(4,5]`
 <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1     1      38      36      35      NA      30

赞(0）回复(0）举报 2022-12-20

我来回答

R：剪切 Dataframe ，但确保所有步骤

1条答案

相关问题

热门标签

最新问答