加速R中的某种 Dataframe 拉伸

ncgqoxb0  于 2023-03-05  发布在  其他
关注(0)|答案(2)|浏览(118)

我有以下旅行预订数据:

library(dplyr)
bookings <- data.frame(Route = 'AB', DepartureMonth = rep(yearmonth("2013-11"),9),
           EffectiveFrom = c(rep(yearmonth("2013-07"),5), c(rep(yearmonth("2013-08"),4))),
           EffectiveTo = c("2013-08", "2013-09", "2013-10", "2013-11", "2199-12",
                           "2013-09", "2013-10", "2013-11", "2199-12"),
           ConfirmedBooking = c(16, 6, 8, 15, 15, 76, 95, 81, 202)) %>%
  mutate(EffectiveTo = yearmonth(EffectiveTo))
bookings
Route DepartureMonth EffectiveFrom    EffectiveTo ConfirmedBooking
AB      2013 nov.       2013 july       2013 aug.       16
AB      2013 nov.       2013 july       2013 sept.      6
AB      2013 nov.       2013 july       2013 oct.       8
AB      2013 nov.       2013 july       2013 nov.       15
AB      2013 nov.       2013 july       2199 dec.       15
AB      2013 nov.       2013 aug.       2013 sept.      76
AB      2013 nov.       2013 aug.       2013 oct.       95
AB      2013 nov.       2013 aug.       2013 nov.       81
AB      2013 nov.       2013 aug.       2199 dec.       202

我写了一个代码来获取出发前一个月月底的预订数量:
一个二个一个一个
问题是代码的执行几乎需要一个小时,当我从170.000行的 Dataframe 开始到达34.000行的 Dataframe 时。
booking_month_decomposition函数运行时间太长。我是否使用了错误的purrr函数?
PS:* 预订数量应该会在我们离出发越来越近的时候增加,但是这看不出来,因为为了简化,我缩短了出发数据框,从9月份开始取消预订。*

whlutmcx

whlutmcx1#

循环在R中确实效率不高,所以最好尽可能地找到避免它的方法。
对于您的用例,您可以创建列BookingMonth,其中包含一个向量,该向量包含每行在EffectiveFromend_month之间的月份的seq,然后取消嵌套该列以复制每个月份的行。
然后,您可以过滤EffectiveTo何时等于BookingMonth(因为EffectiveTo不包含在内),然后使用变量ConfirmedBookinggroup_bysum

library(dplyr)
library(tidyr)
library(tsibble)

bookings = bookings %>%
  mutate(end_month = if_else(EffectiveTo < DepartureMonth, EffectiveTo, DepartureMonth)) %>%
  group_by(Route, EffectiveTo, ConfirmedBooking, DepartureMonth) %>%
  do(
    BookingMonth = seq(from = min(.$EffectiveFrom), to = max(.$end_month), by = 1)
  ) %>%
  unnest() %>%
  ungroup() %>%
  filter(EffectiveTo != BookingMonth) %>%
  group_by(Route, DepartureMonth, BookingMonth) %>%
  summarise(ConfirmedBooking = sum(ConfirmedBooking)) %>%
  filter(DepartureMonth != BookingMonth)
mwngjboj

mwngjboj2#

我尝试了@Wawv的解决方案。非常有用,速度很快,但没有给另一个样本正确的结果(低估了一些预订月份):

bookings <- data.frame(Route = 'AB',
    DepartureMonth = rep(yearmonth("2013-12"), 2),
    EffectiveFrom = c("2013-08", "2013-09") %>% yearmonth(),
    EffectiveTo = c("2013-11", "2013-11") %>% yearmonth(),
    ConfirmedBooking = 60)
bookings
Route   DepartureMonth  EffectiveFrom   EffectiveTo ConfirmedBooking
AB      2013 dec.       2013 aug.       2013 nov.       60
AB      2013 dec.       2013 sept.      2013 nov.       60

所以我做了一些改动:
一个二个一个一个
从@Wawv的代码中我得到:

Route   DepartureMonth  BookingMonth    ConfirmedBooking
AB      2013 dec.       2013 aug.       60
AB      2013 dec.       2013 sept.      60
AB      2013 dec.       2013 oct.       60

相关问题