如果在过去24小时内R中有记录,如何返回TRUE?

ghhkc1vu  于 2023-01-22  发布在  其他
关注(0)|答案(3)|浏览(125)

我是R的新手,我正在努力解决这个问题。我有两个数据集。
首先是每个客户的交易记录:

transactions <- dplyr::tibble(order_id = c(1,2,3),
                              customer_id = c(1,1,2),
                              order_date =c("01.02.2023 10:00:00","01.04.2023 10:00:00","01.02.2023 10:00:00"))

第二种是当客户收到一封电子邮件时,客户可以在两个订单之间收到多封电子邮件:

emails <- dplyr::tibble(email_id = c(1,2,3,4,5,6),
                        customer_id = c(1,1,1,1,2,2),
                        email_date =c("01.01.2023 10:00:00",
                                      "01.01.2023 12:00:00",
                                      "01.02.2023 08:00:00",
                                      "01.02.2023 09:00:00",
                                      "01.02.2023 10:00:00",
                                      "01.02.2023 11:00:00"))

我想在第一个数据集中添加一列,如果客户在交易前24小时内收到电子邮件,则该列将返回TRUE。
| 客户标识|订单标识|订单日期|最近24小时的电子邮件|
| - ------|- ------|- ------|- ------|
| 1个|1个|二○二三年二月一日十时|正确|
| 1个|第二章|2023年2月1日09时00分|错误|

ztyzrc3y

ztyzrc3y1#

制作测试数据集:

transactions <- dplyr::tibble(customer_id = c(1,2),
                       order_date =c("01.02.2023 10:00:00","01.02.2023 10:00:00"))

emails <- dplyr::tibble(customer_id = c(1,1,2,2),
                 email_date =c("01.02.2023 09:00:00",
                               "01.07.2022 09:00:00",
                               "31.12.2022 09:00:00",
                               "11.10.2022 09:00:00"))

transactions$order_date <- lubridate::dmy_hms(transactions$order_date)
emails$email_date <- lubridate::dmy_hms(emails$email_date)

写一个你想要的函数:

has_received_email <- function(customer_id, order_date, emails) {
  purrr::map2_lgl(customer_id, order_date, 
                  function(.customer_id, .order_date) {
                    
                    email_dates <- emails %>% 
                      dplyr::filter(customer_id == .customer_id) %>% 
                      dplyr::pull(email_date)
                    
                    any(difftime(.order_date, email_dates, units="hours")<=24)
                    
                  })
}

使用它向事务表中添加列:

transactions %>% 
  dplyr::mutate(received_email = has_received_email(customer_id, order_date, emails))

| 客户标识|订单日期|已接收_电子邮件|
| - ------|- ------|- ------|
| 1个|2023年2月1日10时00分|正确|
| 第二章|2023年2月1日10时00分|错误|
或以其他方式加入、分组和总结:

dplyr::left_join(transactions, emails, by="customer_id") %>% 
  dplyr::group_by_at(all_of(names(transactions))) %>% 
  dplyr::summarize(received_email = any(difftime(order_date, email_date, units="hours")<=24))
0s0u357o

0s0u357o2#

您可以在R中使用merge()函数连接customer_id列上的两个数据集,然后使用difftime()函数计算email_dateorder_date列之间的时间差。可以使用ifelse()语句创建一个新列,如果时差小于或等于24小时,则返回“TRUE”,否则返回“FALSE”。

# Load the two datasets
transactions <- read.csv("transactions.csv")
emails <- read.csv("emails.csv")

# Merge the datasets on customer_id
merged_data <- merge(transactions, emails, by="customer_id")

# Calculate time difference between email_date and order_date
merged_data$time_difference <- difftime(merged_data$order_date, merged_data$email_date, units="hours")

# Create a new column that returns TRUE if time difference is less than or equal to 24 hours
merged_data$email_last_24_hours <- ifelse(merged_data$time_difference <= 24, "TRUE", "FALSE")

或使用lapply:

transactions$email_last_24_hours <- lapply(transactions$customer_id, function(x){
  email_date <- emails[emails$customer_id == x, "email_date"]
  if(is.na(email_date) || difftime(transactions[transactions$customer_id == x, "order_date"], email_date, units="hours") > 24) {
    return("FALSE")
  } else {
    return("TRUE")
  }
})
qyswt5oh

qyswt5oh3#

为了让这个解决方案真正发挥作用,您还应该有一个事务ID,或者至少有一些东西来标识每个客户的订单。
使用dplyrlubridate,您可以执行以下操作

library(tidyverse)

df_1 %>% 
  left_join(df_2) %>% 
  mutate(
    emailed_24 = email_date %within% interval(order_date, order_date %m-% days(1))
  ) # Checking if email_date is within an interval, order_date - 24h (days(1))

# A tibble: 2 × 4
  customer_id order_date          email_date          emailed_24
        <dbl> <dttm>              <dttm>              <lgl>     
1           1 2023-02-01 10:00:00 2023-02-01 09:00:00 TRUE      
2           2 2023-02-01 10:00:00 2022-12-31 09:00:00 FALSE

相关问题