r中的项目组合数

sqxo8psd  于 2023-02-10  发布在  其他
关注(0)|答案(3)|浏览(138)

我有一个购买数据,每个客户购买的每个品牌的金额。我的dt_groupped看起来像这样:
| 识别号|品牌|nr_采购|
| - ------|- ------|- ------|
| 1个|品牌1|1个|
| 1个|品牌2|第二章|
| 第二章|品牌1|三个|
| 第二章|品牌2|第二章|
| 第二章|品牌3|五个|
我想计算矩阵,在矩阵中,我将获得每个家庭购买每2个品牌组合的次数信息(例如ID = 2购买了5次品牌1和品牌2,我不想被更多变量分割,例如品牌1和品牌2%品牌3)
我的预期输出为:
| 识别号|品牌|nr_采购|
| - ------|- ------|- ------|
| 1个|品牌1|1个|
| 1个|品牌2|第二章|
| 1个|品牌1和品牌2|三个|
| 第二章|品牌1|三个|
| 第二章|品牌2|第二章|
| 第二章|品牌3|五个|
| 第二章|品牌1和品牌2|五个|
| 第二章|品牌2和品牌3|七|
| 第二章|品牌1和品牌3|八个|
我试了这个代码:

dt_grouped <- dt %>% group_by(ID, Brand) %>% 
  summarise(nr_purchases = sum(nr_products_bought))

# Create a new data table with the combinations of brands for each ID
dt_result <- data.table()
for (id in unique(dt$ID)) {
  brands <- dt_grouped[ID == id, Brand]
  brands <- paste(brands, collapse = ", ")
  nr_purchases <- sum(dt_grouped[ID == id, nr_purchases])
  dt_result <- rbind(dt_result, data.table(ID = id, Brand = brands, nr_purchases = nr_purchases))
}

但不幸的是给了我一个错误:"in dt_grouped [ID == id,Brand] object HHKEY,Brand not found"您知道为什么会出现该错误吗?/也许有更有效的编码方法。
谢谢你帮忙

tuwxkamq

tuwxkamq1#

下面是一个基于自连接的方法:

combos = merge(dt, dt, by = "ID", allow.cartesian = TRUE)[
  Brand.x < Brand.y,
][
  ,
  c("nr_purchases", "Brand") := list(
    nr_purchases.x + nr_purchases.y,
    paste(Brand.x, Brand.y, sep = " & ")
  )
][
  , c("ID", "Brand", "nr_purchases")
]

rbind(dt, combos)
#    ID           Brand nr_purchases
# 1:  1          Brand1            1
# 2:  1          Brand2            2
# 3:  2          Brand1            3
# 4:  2          Brand2            2
# 5:  2          Brand3            5
# 6:  1 Brand1 & Brand2            3
# 7:  2 Brand1 & Brand2            5
# 8:  2 Brand1 & Brand3            8
# 9:  2 Brand2 & Brand3            7
uwopmtnx

uwopmtnx2#

晚了3分钟......下面是使用data.table的答案。

require("data.table")

data <- data.table(data.frame(
  ID = c(1, 1, 2, 2, 2),
  Brand = c("Brand1", "Brand2", "Brand1", "Brand2", "Brand3"),
  nr_purchases = c(1, 2, 3, 2, 5),
  stringsAsFactors = FALSE))

data[, {
  tb <- tapply(nr_purchases, Brand, sum)
  cm <- combn(Brand, 2)
  list(Brand        = apply(cm, 2, function(x) paste0(x, collapse = " & ")),
       nr_purchases = apply(cm, 2, function(x) sum(tb[x])))
}, by = .(ID)]

#    ID           Brand nr_purchases
# 1:  1 Brand1 & Brand2            3
# 2:  2 Brand1 & Brand2            5
# 3:  2 Brand1 & Brand3            8
# 4:  2 Brand2 & Brand3            7
pcww981p

pcww981p3#

使用combn

library(dplyr)
df %>% 
  reframe(result = combn(seq_along(nr_purchases), 2, function(i)
    list(Brand = paste(Brand[i], collapse = " & "),
         nr_purchases = sum(nr_purchases[i])),
    simplify = FALSE), .by = ID) %>% 
  unnest_wider(result) %>% 
  bind_rows(df, .)

  ID           Brand nr_purchases
1  1          Brand1            1
2  1          Brand2            2
3  2          Brand1            3
4  2          Brand2            2
5  2          Brand3            5
6  1 Brand1 & Brand2            3
7  2 Brand1 & Brand2            5
8  2 Brand1 & Brand3            8
9  2 Brand2 & Brand3            7

相关问题