postgresql SQL:如何根据一组人中最近发生的事件过滤记录

dly7yett  于 2023-04-29  发布在  PostgreSQL
关注(0)|答案(3)|浏览(119)

我正在处理Postgres数据库中的以下表:

card_purchases

purchaser_id    purchase_date   card_id
44              12/10/2021      3
3               1/27/2022       1
19              1/31/2022       2
22              2/15/2022       1
4               6/9/2022        4
17              8/20/2022       2
19              2/4/2023        2
22              3/17/2023       1
3               3/19/2023       2
747             3/24/2023       2
1193            4/14/2023       1

card_templates

card_id   card_name
1         discount pass
2         exclusive pass
3         senior citizen discount
4         customer loyalty

折扣

discount_id     discount_name
1001            goodwill applied
1002            discount pass applied
1003            exclusive pass applied
1004            exclusive pass applied
1005            discount pass applied
1006            exclusive pass applied
1007            exclusive pass applied
1008            discount pass applied
1009            exclusive pass applied
1010            exclusive pass applied
1011            discount pass applied
1012            exclusive pass applied
1013            exclusive pass applied
1014            exclusive pass applied
1015            exclusive pass applied
1016            discount pass applied

discount_orders

order_id    discount_id         created_at  purchaser_id
1100            1002            1/28/2022   3
1101            1003            1/31/2022   19
1102            1004            2/4/2022    19
1103            1005            3/15/2022   22
1104            1006            3/17/2022   19
1105            1007            8/27/2022   17
1106            1008            8/30/2022   22
1107            1009            2/4/2023    19
1108            1010            2/19/2023   19
1109            1011            3/18/2023   22
1110            1012            3/19/2023   19
1111            1013            3/31/2023   747
1112            1014            4/5/2023    19
1113            1015            4/15/2023   747
1114            1016            4/20/2023   1193

下面是一个包含所有数据的SQL Fiddle:http://sqlfiddle.com/#!17/e64c2/1
通过一些背景,我们的客户购买“卡”(即。即通行证),这给予他们有权以折扣价购买我们的产品。这些卡通常有效期为1年,届时客户可以购买另一张卡。
我的目标是查看每个客户在最近购买了一张卡后的购买次数。问题是,我不知道如何计算计数,如果客户购买了一个以上的卡。
例如,客户19购买了两张卡:22年1月31日,然后在23年2月4日再次发生。该客户使用第二张卡共4次。

SELECT
    dord.purchaser_id                   AS user_id
    , COUNT(DISTINCT(dord.order_id))    AS count_orders
FROM
    discounts d
JOIN
    discount_orders dord
ON d.discount_id = dord.discount_id
WHERE
    d.discount_name LIKE '%pass%'
    AND dord.purchaser_id IN (
            SELECT
                cp.purchaser_id
            FROM
                card_purchases cp
            JOIN
                card_templates ct
            ON cp.card_id = ct.card_id
            WHERE
                ct.card_id IN (1, 2)
        )
GROUP BY
    dord.purchaser_id

此代码输出客户19的7次购买的计数(正确)。但是,我只对购买 * 最新卡 * 时或之后的购买感兴趣。因此,对于客户19,这将是4次购买(因为第二张卡是在2/4/23购买的,并且在该日期或之后进行了4次购买)。
我们如何修复此代码,使计数仅适用于客户 * 最近 * 购买一张卡之时或之后的订单?

smtd7mpg

smtd7mpg1#

整体查询逻辑看起来不错,我们只需要调整子查询过滤。我们可以使用一个 correlated 子查询来代替IN,它返回当前购买者最近一次购买卡的日期:

select do.purchaser_id, count(*)
from discount_orders do
inner join discounts d on d.discount_id = do.discount_id
where d.discount_name like '%pass%'
  and do.created_at >= (
        select max(cp.purchase_date)
        from cards_purchase cp
        where cp.card_id in (1, 2)
          and cp.purchaser_id = do.purchaser_id -- correlation 
    )
group by do.purchaser_id

请注意,对于您展示的数据,不需要使用distinct。此外,子查询中cards_template上的连接似乎是不必要的。
我们也可以用join来表达这个逻辑:

select do.purchaser_id, count(*)
from discount_orders do
inner join discounts d on d.discount_id = do.discount_id
inner join (
    select cp.purchaser_id, max(cp.purchase_date) purchase_date
    from cards_purchase cp
    where cp.card_id in (1, 2)
) cp on cp.purchaser_id = do.purchaser_id and cp.purchase_date <= do.created_at    
where d.discount_name like '%pass%'
group by do.purchaser_id
4ktjp1zp

4ktjp1zp2#

您可以使用row_number()窗口函数只选择客户最近一次刷卡。
查询:

with recent_card_purchases as
(
   SELECT *,row_number()over(partition by purchaser_id order by purchase_date desc)rn FROM card_purchases cp
                        JOIN
                            card_templates ct
                        ON cp.card_id = ct.card_id
                        WHERE
                            ct.card_id IN (1, 2)
)
SELECT
    dord.purchaser_id                   AS user_id
    , COUNT(DISTINCT(dord.order_id))    AS count_orders

FROM
    discounts d

JOIN
    discount_orders dord
ON d.discount_id = dord.discount_id
join recent_card_purchases rcp 
  on rcp.purchaser_id=dord.purchaser_id and dord.created_at >=rcp.purchase_date    
WHERE
    d.discount_name LIKE '%pass%' and rcp.rn=1
GROUP BY
    dord.purchaser_id

输出|用户ID|计数指令||- ———————:|- ————————————:||十七岁|1||十九岁|四||二十二|1||七四七|二||一一九三|1|
fiddle

nx7onnlm

nx7onnlm3#

这是一个完美的机会,应用横向连接。查找感兴趣的卡购买,然后查询订单数量。在这种情况下,标量子查询也可以工作,因为只有一个值要返回。横向连接将允许添加其他列。

-- lateral join
select * from (
    select purchaser_id, max(purchase_date) as last_purchase_date
    from card_purchases
    where card_id in (1, 2)
    group by purchaser_id
) mcp inner join lateral (
    select count(1) as cnt from discount_orders as o
    where o.purchaser_id = mcp.purchaser_id and o.created_at >= mcp.last_purchase_date
) mdo on true;

相当于:

-- scalar subquery
select *,
    (
    select count(1) as cnt from discount_orders as o
    where o.purchaser_id = mcp.purchaser_id and o.created_at >= mcp.last_purchase_date
    ) as cnt
from (
    select purchaser_id, max(purchase_date) as last_purchase_date
    from card_purchases
    where card_id in (1, 2)
    group by purchaser_id
) mcp;

我删除了过滤器上的折扣名称,因为我不相信这是必要的。
https://dbfiddle.uk/5QN_gb2s?hide=4

相关问题