sql查询产品的频率分布矩阵

mjqavswn 于 2021-06-26 发布在 Hive

关注(0)|答案(7)|浏览(265)

我想创建一个频率分布矩阵

1.Create a matrix.**Is it possible to get this in separate columns**

  customer1       p1         p2      p3
  customer 2      p2         p3
  customer 3      p2         p3      p1
  customer 4      p2         p1

2. Then I have to count the number of products that come together the most

   For eg  
    p2 and p3 comes together 3 times
    p1 p3   comes 2 times
    p1 p2  comes  2 times

I want to recommend products to customers ,frequency of products that comes together

 select customerId,product,count(*) from sales group by customerId,product

有人能帮我解决这个问题吗

sql Hive apache-spark hiveql

来源：https://stackoverflow.com/questions/42749016/sql-query-frequency-distribution-matrix-for-product

7条答案

按热度按时间

wb1gzix01#

--------3
薯片|焦炭

赞(0）回复(0）举报 2021-06-26

eivgtgni2#

--------1
可口可乐糖果

赞(0）回复(0）举报 2021-06-26

6ojccjat3#

数组

select      customerId
           ,sort_array(collect_set (product))   as products

from        sales 

group by    customerId
;

多列

select      customerId

           ,products[0] as p0
           ,products[1] as p1
           ,products[2] as p2
           ,products[3] as p3
           ,products[4] as p4
           ,products[5] as p5
           ,products[6] as p6
           ,products[7] as p7
           ,products[8] as p8
           ,products[9] as p9

from       (select      customerId
                       ,sort_array(collect_set (product))   as products

            from        sales 

            group by    customerId
            ) s
;

这将返回整个产品组合的频率。
在你的例子中 (p1,p2,p3) 是最常见的（出现两次）。 (p1,p2) 只出现一次 (p2,p3) .
有关元组的频率，请参见@gordonlinoff answer。

select      s.products
           ,count(*)    as frequency 

from       (select      customerId
                       ,sort_array(collect_set (product))   as products

            from        sales 

            group by    customerId
            ) s

group by    s.products           

order by    frequency desc

赞(0）回复(0）举报 2021-06-26

m2xkgtsf4#

如果您需要客户购买的成对产品，则可以使用自联接：

select s1.product, s2.product, count(*) as cnt
from sales s1 join
     sales s2
     on s1.customerId = s2.customerId
where s1.product < s2.product
group by s1.product, s2.product
order by cnt desc;

通过使用更多的连接，可以将其扩展到两个以上的产品。

赞(0）回复(0）举报 2021-06-26

i86rm4rw5#

------1
上述结果的解释：
大多数情况下，3次，客户只购买可乐
接下来，有两次顾客买了薯片和可乐
曾经有个顾客只买芯片
有一次一个顾客买了一瓶可乐和糖果

赞(0）回复(0）举报 2021-06-26

9udxz4iz6#

--------2
芯片

赞(0）回复(0）举报 2021-06-26

gjmwrych7#

我知道光标已经不再是这个月的特色了，但我仍然不时地发现它们非常有用
下面的代码创建一个表变量，然后循环遍历销售数据，找出最常一起销售的产品组合

--Results will be loaded into this table
DECLARE @OutputCombination TABLE (CombinationName VARCHAR(MAX), CombinationsFound INT) 

--Just for demo - create table and load with data
declare @demodata table (ClientID int, ProductBought varchar(50))
insert into @demodata (Clientid, ProductBought) values 
(1,'Coke'),
(1,'Chips'),
(1,'Chips'),
(2,'Coke'),
(2,'Coke'),
(2,'Sweets'),
(3,'Coke'),
(4,'Chips'),
(5,'Coke'),
(5,'Chips'),
(6,'Coke'),
(7,'Coke')

DECLARE clients CURSOR
READ_ONLY
FOR SELECT DISTINCT clientID from @demodata

DECLARE @clientID INT
OPEN clients

FETCH NEXT FROM clients INTO @clientID
WHILE (@@fetch_status <> -1)
BEGIN
       DECLARE @ThisClientCombination VARCHAR(MAX) --This is going to be used to find/update combinations of products sold from the data - pipe-delimiting them
       SET @ThisClientCombination  = '' --better to actually wipe variables in cursor loops, as cursors sometimes funny about things like that
       ;WITH thisClientsDistinctCoverages AS --CTE used because of the SQL-funnies caused when using DISTINCT and ORDER BY
       (
              SELECT DISTINCT TOP 100 PERCENT CoverageItems = ProductBought + '|' FROM @demodata WHERE ClientID = @clientID ORDER BY ProductBought + '|' --order alphabetically and do DISTINCT so that 2x cokes are changed into just 1 coke - for the sake of combos
       )
       SELECT @ThisClientCombination = @ThisClientCombination + CoverageItems FROM thisClientsDistinctCoverages
       SET @ThisClientCombination = LEFT(@ThisClientCombination,LEN(@ThisClientCombination)-1) --Drop trailing Pipe '|'

       IF EXISTS(SELECT * FROM @OutputCombination WHERE CombinationName = @ThisClientCombination)
              UPDATE @OutputCombination SET CombinationsFound = CombinationsFound + 1 WHERE CombinationName = @ThisClientCombination --found before, increase count by 1
       ELSE
              INSERT INTO @OutputCombination (CombinationName, CombinationsFound) VALUES (@ThisClientCombination, 1)--first time found, make entry 1

       FETCH NEXT FROM clients INTO @clientID
END

CLOSE clients
DEALLOCATE clients

--Show the results
SELECT * FROM @OutputCombination ORDER BY CombinationsFound DESC, CombinationName

这将产生以下结果：
组合名称------组合基金
焦炭

赞(0）回复(0）举报 2021-06-26

我来回答

sql查询产品的频率分布矩阵

7条答案

相关问题

热门标签

最新问答