用lubridate比较行之间的日期

k10s72fa  于 11个月前  发布在  其他
关注(0)|答案(3)|浏览(117)

我尝试比较行之间的日期并对其应用逻辑。我使用代码here作为起点,但实际日期比较时出现错误。
我想确定的是
1.后续购买之间的时间间隔(天),以及
1.如果时间为60天或更长,并且
1.什么是当前和以前的购买
样品是:

mydata <- data.frame(store=c('A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'),
                     PurchaseDate =c('2023-01-01', '2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01',    '2023-04-01', '2023-05-01', '2023-06-01'), 
                 sales=c("apples", "bananas", "cherries", "bananas", "cherries", "cherries", "bananas", "bananas"))

mydata$PurchaseDate <- as.Date(mydata$PurchaseDate)

mydata %>%
  group_by(store) |> arrange((PurchaseDate), .by_group = TRUE)

字符串
我想返回的是这样的东西,以显示后续购买之间的天数,如果是60天或更长时间,以及之前的购买是什么。最终我会比较销售和最后的销售。

store  PurchaseDate  sales      LastPurchase    over60   LastSale
A      2023-01-01   apples        0             0          NA
A      2023-02-01   cherries     32             0          apples
A      2023-04-02   cherries     61             1          cherries
A      2023-05-01   bananas      31             0          cherries
B      2023-01-01   bananas      0              0          NA
B      2023-03-02   bananas      61             1          bananas
B      2023-04-01   cherries     31             0          bananas
B      2023-06-01   bananas      62             1          cherries


我试过了,但现在我知道这里的时间差是错的,但我需要的是相邻行之间的时间差。

mydata %>%
  group_by(store) |> arrange((PurchaseDate), .by_group = TRUE) |>
  mutate(over60 = ifelse(abs(difftime(PurchaseDate, lag(sales, default =first(PurchaseDate)), units = "days")) < 60, 0, 1))

bxfogqkk

bxfogqkk1#

library(dplyr)
mydata |>
  group_by(store) |> 
  arrange((PurchaseDate), .by_group = TRUE) |>
  mutate(Day = as.numeric(difftime(as.Date(PurchaseDate), lag(as.Date(PurchaseDate)), units = "days")), 
         Over60 = ifelse(Day > 60L, 1L, 0L), 
         LastSale = lag(sales))

字符串

# A tibble: 8 × 6
# Groups:   store [2]
  store PurchaseDate sales      Day Over60 LastSale
  <chr> <chr>        <chr>    <dbl>  <int> <chr>   
1 A     2023-01-01   apples      NA     NA NA      
2 A     2023-02-01   cherries    31      0 apples  
3 A     2023-04-01   cherries    59      0 cherries
4 A     2023-05-01   bananas     30      0 cherries
5 B     2023-01-01   bananas     NA     NA NA      
6 B     2023-03-01   bananas     59      0 bananas 
7 B     2023-04-01   cherries    31      0 bananas 
8 B     2023-06-01   bananas     61      1 cherries


Day列中有NA可以提供潜在的优势。显然,它总是取决于任务。请参阅?lag以获得澄清。使用{dplyr}中的lag-函数,R base提供了另一个没有default=-参数的lag-函数。

4xy9mtcn

4xy9mtcn2#

我认为这段代码在你的请求中创建了三个特定的列。基数r中的diff函数计算连续的PurchaseDate值之间的差值。它从0开始表示第一个PurchaseDate,因为没有以前的PurchaseDate可以比较。

mydata %>%
  group_by(store)  |> 
  arrange((PurchaseDate), .by_group = TRUE)  |> 
  mutate(LastPurchase = c(0, diff(PurchaseDate)),
  over60 = ifelse(LastPurchase < 60, 0, 1),
  LastSale = lag(sales))

字符串
结果是这样的:

store PurchaseDate sales    LastPurchase over60 LastSale
  <chr> <date>       <chr>           <dbl>  <dbl> <chr>   
1 A     2023-01-01   apples              0      0 NA      
2 A     2023-02-01   cherries           31      0 apples  
3 A     2023-04-01   cherries           59      0 cherries
4 A     2023-05-01   bananas            30      0 cherries
5 B     2023-01-01   bananas             0      0 NA      
6 B     2023-03-01   bananas            59      0 bananas 
7 B     2023-04-01   cherries           31      0 bananas 
8 B     2023-06-01   bananas            61      1 cherries

zbsbpyhn

zbsbpyhn3#

mydata  %>% 
arrange(store, PurchaseDate) %>% 
group_by(store) %>% 
mutate(Days = difftime(PurchaseDate, lag(PurchaseDate, default = min(mydata$PurchaseDate)), units = 'days'), 
       over60 = if_else(Days <= 60, 0, 1), last_sale = lag(sales))  %>% 
ungroup()

字符串
导致

store PurchaseDate sales    Days    over60 last_sale
  <chr> <date>       <chr>    <drtn>   <dbl> <chr>    
1 A     2023-01-01   apples    0 days      0 NA       
2 A     2023-02-01   cherries 31 days      0 apples   
3 A     2023-04-01   cherries 59 days      0 cherries 
4 A     2023-05-01   bananas  30 days      0 cherries 
5 B     2023-01-01   bananas   0 days      0 NA       
6 B     2023-03-01   bananas  59 days      0 bananas  
7 B     2023-04-01   cherries 31 days      0 bananas  
8 B     2023-06-01   bananas  61 days      1 cherries

相关问题