R语言 合并两个集合中的数据以创建新变量

9rnv2umw  于 2023-04-18  发布在  其他
关注(0)|答案(2)|浏览(113)

我有一个数据集,其中包含有关客户生命周期的信息。我想创建一个时间轴,其中包含日期(这是一个日期_ &关系长度(天或月)的第一个,在之间和第二个客户的生命周期。时间线应该看起来如下:first_date - first_lifetime_length - defection_date - defection_length - reactivation_date - second_lifetime_length - last_date
问题是数据位于两个不同的数据集中(customer_data和purchase_history),其中客户通过三个ID连接。数据集的顶部如下所示(目前仅包括相关变量),左侧为customer_data,右侧为purchase_history。

ID first_date                    ID date_of_purchase 
1  2014-01-13                    1 2014-09-12
2  2016-03-01                    1 2014-11-12
3  2016-06-13                    1 2015-02-13
4  2013-12-02                    1 2017-02-14
5  2017-03-27                    1 2018-12-13
6  2016-04-29                    1 2019-04-15
7  2017-11-01                    2 2016-03-01
8  2016-04-07                    3 2016-06-13
9  2016-02-29                    3 2016-09-20
10 2014-12-15                    3 2016-10-20

到目前为止,我编写的代码如下所示:

# Timeline variables needed for further calculation ----
## First purchase date
customer_data$first_date

## End of First-LT, begin defection period
# A customer is considered defected after not purchasing for 365 days
purchase_history <- purchase_history %>%
  group_by(ID) %>%
  mutate(defected = if_else(date_of_purchase - lag(date_of_purchase) > 365, 1, 0))

## Defection date
purchase_history$defection_date <- if_else(purchase_history$defected==1, purchase_history$date_of_purchase, NA)

最后,我想把所有的变量都和ID连接起来,并保留所有的purchase_history。有人能帮我创建相关的变量吗?
亲切的问候

mklgxw1f

mklgxw1f1#

加入您的第一个 Dataframe “第一次购买”与“购买历史”不失去任何客户

library(tidyverse)

# first purchase data
cx_fisrt_purchace <- data.frame(
    ID = 1:10,
    first_date = as.Date(c("2014-01-13", "2016-03-01","2016-06-13", "2013-12-02", "2017-03-27", "2016-04-29", "2017-11-01","2016-04-07", "2016-02-29", "2014-12-15"))
    )
# purchase history data
purchase_history <- data.frame(
    ID = c(rep(1,6),2,rep(3,3)),
    date_of_purchase = as.Date(c("2014-09-12","2014-11-12","2015-02-13","2017-02-14","2018-12-13","2019-04-15","2016-03-01","2016-06-13","2016-09-20","2016-10-20"))
    ) 

purchase_history <-purchase_history %>%
  group_by(ID) %>%
  mutate(defected = coalesce(if_else(date_of_purchase - lag(date_of_purchase) > 365, 1, 0),0))%>%
  mutate(defection_date =if_else(defected==1, date_of_purchase, NA))

cx_fisrt_purchace%>%
left_join(purchase_history, by ="ID")

# output

   ID first_date date_of_purchase defected defection_date
1   1 2014-01-13       2014-09-12        0           <NA>
2   1 2014-01-13       2014-11-12        0           <NA>
3   1 2014-01-13       2015-02-13        0           <NA>
4   1 2014-01-13       2017-02-14        1     2017-02-14
5   1 2014-01-13       2018-12-13        1     2018-12-13
6   1 2014-01-13       2019-04-15        0           <NA>
7   2 2016-03-01       2016-03-01        0           <NA>
8   3 2016-06-13       2016-06-13        0           <NA>
9   3 2016-06-13       2016-09-20        0           <NA>
10  3 2016-06-13       2016-10-20        0           <NA>
11  4 2013-12-02             <NA>       NA           <NA>
12  5 2017-03-27             <NA>       NA           <NA>
13  6 2016-04-29             <NA>       NA           <NA>
14  7 2017-11-01             <NA>       NA           <NA>
15  8 2016-04-07             <NA>       NA           <NA>
16  9 2016-02-29             <NA>       NA           <NA>
17 10 2014-12-15             <NA>       NA           <NA>
lsmd5eda

lsmd5eda2#

不确定最终目标是什么,但这可能是一个开始。
使用完全连接(merge)和修改的 df2,包括 defection_date

library(dplyr)

merge(df1, 
  df2 %>% 
    group_by(ID) %>% 
    mutate(defection_date = 
      if_else(date_of_purchase - lag(date_of_purchase) > 365, 
        date_of_purchase, NA)), 
  by = "ID", all = T)
   ID first_date date_of_purchase defection_date
1   1 2014-01-13       2014-09-12           <NA>
2   1 2014-01-13       2014-11-12           <NA>
3   1 2014-01-13       2015-02-13           <NA>
4   1 2014-01-13       2017-02-14     2017-02-14
5   1 2014-01-13       2018-12-13     2018-12-13
6   1 2014-01-13       2019-04-15           <NA>
7   2 2016-03-01       2016-03-01           <NA>
8   3 2016-06-13       2016-06-13           <NA>
9   3 2016-06-13       2016-09-20           <NA>
10  3 2016-06-13       2016-10-20           <NA>
11  4 2013-12-02             <NA>           <NA>
12  5 2017-03-27             <NA>           <NA>
13  6 2016-04-29             <NA>           <NA>
14  7 2017-11-01             <NA>           <NA>
15  8 2016-04-07             <NA>           <NA>
16  9 2016-02-29             <NA>           <NA>
17 10 2014-12-15             <NA>           <NA>
数据
df1 <- structure(list(ID = 1:10, first_date = c("2014-01-13", "2016-03-01", 
"2016-06-13", "2013-12-02", "2017-03-27", "2016-04-29", "2017-11-01", 
"2016-04-07", "2016-02-29", "2014-12-15")), class = "data.frame", 
row.names = c(NA, -10L))

df2 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L), 
    date_of_purchase = structure(c(16325, 16386, 16479, 17211, 
    17878, 18001, 16861, 16965, 17064, 17094), class = "Date")), 
row.names = c(NA, -10L), class = "data.frame")

相关问题