postgresql SQL:如何获取给定行中某个值的相对位置

vhipe2zx  于 2023-05-17  发布在  PostgreSQL
关注(0)|答案(4)|浏览(124)

我正在处理一个名为orders的Postgres表,它看起来像这样:

user_id   product       order_date
1         pants         7/1/2022
2         shirt         6/1/2022
1         socks         3/17/2023
3         pants         2/17/2023
4         shirt         3/13/2023
2         pants         8/15/2022
1         hat           4/15/2022
5         hat           3/14/2023
2         socks         12/3/2022
3         shirt         4/15/2023
4         socks         1/15/2023
4         pants         4/19/2023
5         shirt         5/2/2023
5         belt          5/15/2023

下面是一个dB Fiddle的数据:https://www.db-fiddle.com/f/uNGjP7gpKwdPGrJ7XmT7k3/2
我输出了一个表,显示了客户订单的 * 序列 *:

user_id   first_order   second_order    third_order
1         hat           pants           socks
2         shirt         pants           socks
3         pants         shirt           <null>
4         socks         shirt           pants
5         hat           shirt           belt

例如,顾客1首先购买了帽子,然后购买了裤子,最后购买了袜子。
我想在行级别设置某种指示器,告诉我特定客户是否在购买另一个产品之前购买了一个产品。例如,我想指出客户是否在购买裤子之前购买了衬衫。
所需的输出如下所示:

user_id   first_order   second_order    third_order     shirt_before_pants
1         hat           pants           socks           false
2         shirt         pants           socks           true
3         pants         shirt           <null>          false
4         socks         shirt           pants           true
5         hat           shirt           belt            false

有没有一种方法可以在行级别获得给定值的相对位置?
谢谢你的帮助…-瑞秋

bejyjqdl

bejyjqdl1#

我们可以使用row_number()枚举每个客户的订单,然后使用条件聚合生成新列。要检查一个产品是否在另一个之前购买,我们可以比较两个产品的最小订单日期:

select user_id,
    max(product) filter(where rn = 1) product_1,
    max(product) filter(where rn = 2) product_2,
    max(product) filter(where rn = 3) product_3,
    ( 
          min(order_date) filter(where product = 'shirt') 
        < min(order_date) filter(where product = 'pants')
    ) shirt_before_pants
from (
    select o.*, row_number() over(partition by user_id order by order_date) rn
    from orders o
) o
group by user_id
e0bqpujr

e0bqpujr2#

如果...

  • ……“before”应该是指“紧接在……之前”,中间没有其他顺序
  • 。。。一系列产品而不是每个产品单独列是可以接受的
  • 。。。您有一个单独的“用户”表
SELECT o.*
FROM   users u
CROSS  JOIN LATERAL (
   SELECT o.user_id
        , array_agg(o.product) AS products
        , bool_or(o.combo) AS shirt_before_pants
   FROM  (
      SELECT o.user_id, o.product::text
           , o.product = 'pants' AND lag(o.product) OVER (ORDER BY o.order_date) = 'shirt' AS combo
      FROM   orders o
      WHERE  o.user_id = u.user_id
      ORDER  BY o.order_date
      LIMIT  3  -- cutoff
      ) o
   GROUP  BY 1
   ) o
ORDER  BY u.user_id;

fiddle
它的美妙之处:只更改LIMIT为不同数量的订单在您的请求。只在一个地方换裤子和衬衫。
由于子查询中的排序,输出数组中的产品已排序。参见:

  • 在SELECT中创建数组

如果您在orders(user_id, order_date)或更好的orders(user_id, order_date) INCLUDE (product)上有一个索引,那么对于每个用户有 * 许多 * 订单的大表,查询性能良好。
如果你没有users表(你应该有一个),可以这样创建:

CREATE TABLE users AS
SELECT DISTINCT user_id
FROM   orders
ORDER  BY user_id;  -- optional

或者在这里阅读更快的方法:

  • 优化GROUP BY查询以检索每个用户的最新行
nvbavucw

nvbavucw3#

此方法使用窗口函数ROW_NUMBER(DENSE_RANK也可以工作),它为user_id聚合的每一行分配一个行号。为了确定衬衫是否是在裤子之前购买的,我们可以比较这些产品的生成的 row_ids

With cte as (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS rn
  FROM orders
)
select user_id, max(case when rn = 1 then product end) as first_order,
                max(case when rn = 2 then product end) as second_order,
                max(case when rn = 3 then product end) as third_order,
                MAX(case when product = 'shirt' then rn end) 
                < MAX(case when product = 'pants' then rn end) as shirt_before_pants
from cte
GROUP BY user_id;
omvjsjqw

omvjsjqw4#

array_position函数在这里可能会有帮助:

WITH 

first_orders AS (
  SELECT "user_id", "product", MIN("order_date") AS "order_date"
  FROM "orders"
  GROUP BY "user_id", "product"),

product_arrays AS (
  SELECT "user_id", 
    array_agg(product ORDER BY order_date) AS "products"
  FROM first_orders
  GROUP BY "user_id")
  
SELECT * 
FROM product_arrays
WHERE array_position(products, 'shirt') 
         < array_position(products, 'pants')

或者,以下方法也可以:

WITH 

product_arrays AS (
  SELECT "user_id", 
    array_agg(product ORDER BY order_date) AS "products"
  FROM orders
  GROUP BY "user_id")
  
SELECT * 
FROM product_arrays
WHERE array_position(products, 'shirt') 
         < array_position(products, 'pants')

相关问题