查找与某个类别交互的重复用户数

wa7juj8i  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(459)

我是sql的新手,正在尝试解决以下问题。。想象下表:

user_id, category_id
1, 12344
1, 12344
1, 12345
2, 12345
2, 12345
3, 12344
3, 12344

以此类推。。我想找出每个类别获得的重复用户数。。
所以,在上面的例子中。。

12344, 2 (because user_id 1 and 3 are repeated users)
12345, 1 (user_id 2 is repeated user.. 1 is not as that user visited just once)

如何在sql/hive中解决这个问题?

ycl3bljg

ycl3bljg1#

想要了解你要找的东西有点困难,但要测试一下:

select category_id, count(user_id) from (Select category_id, user_id, count(table_primary_id) as 'total' from tablename group by category_id, user_id) a where total > 1 group by category_id

子查询统计用户访问某个类别的次数,而外部查询应统计多次访问某个类别的用户数。

uqxowvwt

uqxowvwt2#

例如。:

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,user_id INT NOT NULL
,category_id INT NOT NULL
);

INSERT INTO my_table (user_id,category_id) VALUES
(1, 12344),
(1, 12344),
(1, 12345),
(2, 12345),
(2, 12345),
(3, 12344),
(3, 12344);

SELECT category_id
     , COUNT(*) total 
  FROM 
     ( SELECT x.* 
         FROM my_table x 
         JOIN my_table y 
           ON y.user_id = x.user_id 
          AND y.category_id = x.category_id 
          AND y.id < x.id
     ) a 
 GROUP 
    BY category_id;
+-------------+-------+
| category_id | total |
+-------------+-------+
|       12344 |     2 |
|       12345 |     1 |
+-------------+-------+

相关问题