postgresql SQL间隙和孤岛失败,出现3行

rur96b6h  于 2023-04-29  发布在  PostgreSQL
关注(0)|答案(1)|浏览(124)

考虑到以下数据

# SELECT * FROM foobar;
 id | quantity |    time
----+----------+------------
  1 |       50 | 2022-01-01
  2 |      100 | 2022-01-02
  3 |       50 | 2022-01-03
  4 |       50 | 2022-01-04

(note:第4行只是为了说明预期结果,没有必要重现问题)
我想根据两个假设提取三个不同的群体:1)每次quantity改变时应创建一个新组,2)连续相同的量应合并到同一组中。最终结果应该如下所示

id | quantity |    time    |  group_id
----+----------+------------+-------------
  1 |       50 | 2022-01-01 |           1
  2 |      100 | 2022-01-02 |           2
  3 |       50 | 2022-01-03 |           3
  4 |       50 | 2022-01-04 |           3

我尝试实现一个间隙和孤岛解决方案来解决这个问题,但它失败了,因为组最终合并了不同数量的行。请注意,交换第2行和第3行可修复此问题。使用DENSE_RANK会导致同样的问题。

# SELECT
    *,
    ROW_NUMBER() OVER (ORDER BY time) as global_rank,
    ROW_NUMBER() OVER (PARTITION BY quantity ORDER BY time) as qty_counter
FROM foobar;
 id | quantity |    time    | global_rank | qty_counter
----+----------+------------+-------------+-------------
  1 |       50 | 2022-01-01 |           1 |           1    # global_rank - qty_counter = 0 
  2 |      100 | 2022-01-02 |           2 |           1    # global_rank - qty_counter = 1 
  3 |       50 | 2022-01-03 |           3 |           2    # global_rank - qty_counter = 1 
  4 |       50 | 2022-01-04 |           4 |           3    # global_rank - qty_counter = 1 
(4 rows)

如何更改此查询以获得预期的结果?

2hh7jdfx

2hh7jdfx1#

我发现用lag()解决这个问题更简单;想法是将每个quantity与 * 前一 * 值进行比较,并且当它们不匹配时递增计数器。

select id, quantity, time, 
    count(*) filter(where quantity is distinct from lag_quantity) over(order by time) as grp
from (
    select f.*, lag(quantity) over(order by time) lag_quantity
    from foobar f
) f
order by time
身份证数量时间grp
1五十2022-01-01 2022-01-011
一百2022-01-02 2022-01-02
五十2022-01-03 2022-01-03
五十2022-01-04 2022-01-04

fiddle

相关问题