R语言 基于另一列中的值连接字符串

mm5n2pyu  于 2022-12-30  发布在  其他
关注(0)|答案(2)|浏览(160)

我正在尝试连接特定客户ID/订单日期组的字符串。我有一个 Dataframe :

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
df <- data.frame(customerid, orderdate, orderid, status)

我尝试按客户ID和订单日期分组。然后,对于每个组,我希望除1个“review”外的所有“review”更改为“duplicate”,并按customerid/orderdate连接所有订单ID。结果将为:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
orderid <- c("1,2", "1,2", "3,4", "3,4", "5,6,7", "5,6,7", "5,6,7", "8")
status <- c("review", "duplicate", "review", "negative", "positive", "review", "duplicate", "review")
df <- data.frame(customerid, orderdate, orderid, status)

泰!

aij0ehis

aij0ehis1#

你可以试试

library(dplyr)

df %>%
  group_by(customerid, orderdate) %>%
  mutate(orderid = toString(orderid),
         status = ifelse(status == "review" & duplicated(status), "duplicate", status)) %>%
  ungroup()

# A tibble: 8 × 4
  customerid orderdate  orderid status
  <chr>      <chr>      <chr>   <chr>    
1 A1         2018-09-14 1, 2    review
2 A1         2018-09-14 1, 2    duplicate
3 A2         2018-09-15 3, 4    review
4 A2         2018-09-15 3, 4    negative
5 A3         2020-08-21 5, 6, 7 positive
6 A3         2020-08-21 5, 6, 7 review
7 A3         2020-08-21 5, 6, 7 duplicate
8 A4         2018-08-10 8       review
tzdcorbm

tzdcorbm2#

你可以试试

library(dplyr)
df %>%
  group_by(customerid, orderdate) %>%
  mutate(orderid = paste0(orderid, collapse = ","),
         status = ifelse(row_number() == 1, status, "duplicate")
         ) 

  customerid orderdate  orderid status   
  <chr>      <chr>      <chr>   <chr>    
1 A1         2018-09-14 1,2     review   
2 A1         2018-09-14 1,2     duplicate
3 A2         2018-09-15 3,4     review   
4 A2         2018-09-15 3,4     duplicate
5 A3         2020-08-21 5,6,7   review   
6 A3         2020-08-21 5,6,7   duplicate
7 A3         2020-08-21 5,6,7   duplicate
8 A4         2018-08-10 8       review

我不确定negative/positive的确切条件,但是

df %>%
  group_by(customerid, orderdate) %>%
  mutate(orderid = paste0(orderid, collapse = ",")) %>%
  group_by(customerid, orderdate, status) %>%
  mutate(status = ifelse((row_number() != 1) & (status == "review"), "duplicate", status))

  customerid orderdate  orderid status   
  <chr>      <chr>      <chr>   <chr>    
1 A1         2018-09-14 1,2     review   
2 A1         2018-09-14 1,2     duplicate
3 A2         2018-09-15 3,4     review   
4 A2         2018-09-15 3,4     negative 
5 A3         2020-08-21 5,6,7   positive 
6 A3         2020-08-21 5,6,7   review   
7 A3         2020-08-21 5,6,7   duplicate
8 A4         2018-08-10 8       review

相关问题