sql:使用partitionby查找五分位数

c6ubokkw  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(296)

我有一张交易表 t1 看起来是这样的:

store_id.   cust_id.   amount
100         1000        1.00
100         1000        2.05
100         1000        3.15
100         2000        5.00
100         2000        6.00
200         3000       16.00
200         3000       60.00
200         1000        6.00
300         1000        8.05
300         1000       15.15

我还有一张参考表 t2 将商店与一个或多个行业同行配对:

store_id.   peer_id
100         200
200         100
300         300

下面是生成上表的代码:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1(
    store_id int,
    cust_id int,
    amount float,
);
INSERT INTO t1 VALUES(100,1000,1.00);
INSERT INTO t1 VALUES(100,1000,2.05);
INSERT INTO t1 VALUES(100,1000,3.15);
INSERT INTO t1 VALUES(100,2000,5.00);
INSERT INTO t1 VALUES(100,2000,6.00);
INSERT INTO t1 VALUES(200,3000,16.00);
INSERT INTO t1 VALUES(200,3000,60.00);
INSERT INTO t1 VALUES(200,1000,6.00);
INSERT INTO t1 VALUES(300,1000,8.05);
INSERT INTO t1 VALUES(300,1000,15.15);

DROP TABLE IF EXISTS t2;
CREATE TABLE t2(
    store_id int,
    peer_id int
);

INSERT INTO t2 VALUES(100,200);
INSERT INTO t2 VALUES(200,100);
INSERT INTO t2 VALUES(300,300);

我想根据每个顾客在某个特定商店和该商店的同龄人的总消费,将他们的id分配到五分之一。像这样:

SELECT
    a.cust_id
    , b.store_id
    , b.peer_id
    , NTILE(5) OVER(PARTITION BY b.store_id, b.peer_id ORDER BY sum(a.amount) DESC) as quintile 
    ,sum(a.amount) as total
FROM t1 a
LEFT JOIN t2 b
ON a.store_id=b.store_id
GROUP BY a.cust_id, b.store_id, b.peer_id;

但是,此代码不起作用,因为它将每个客户分配给多个五分位数。最好的办法是什么?

toiithl6

toiithl61#

我认为你需要:

select t2.store_id, t1.customer_id, sum(t1.amount),
       ntile(5) over (partition by t1.store_id order by sum(t1.amount))
from t1 join
     t2
     on t1.store_id = t2.peer_id
group by t1.customer_id, t2.store_id;

注意:这假设存储是它自己的对等。如果不是这样的话,你需要额外的逻辑。
编辑:
最有效的修改是使存储成为自己的对等存储。您还可以使用以下逻辑:

select t2.store_id, t1.customer_id, sum(t1.amount),
       ntile(5) over (partition by t1.store_id order by sum(t1.amount))
from t1 join
     t2
     on t1.store_id = t2.peer_id or t2.store_id = t1.store_id
group by t1.customer_id, t2.store_id;

但是 or 会扼杀表演。
所以:

select t1.store_id, t1.customer_id, sum(t1.amount),
       ntile(5) over (partition by t1.store_id order by sum(t1.amount))
from t1 join
     (select store_id, peer_id
      from t2
      union all
      select distinct store_id, store_id
      from t2
     ) t2
     on t1.store_id = t2.peer_id or t2.store_id = t1.store_id
group by t1.customer_id, t2.store_id;

注意:这假设所有的商店都在 t2 . 第二个子查询只需要获取额外的行——可以使用 t1 或者另一张table。

相关问题