postgresql Postgres:表的“累积”视图

qaxu7uf2  于 2023-05-17  发布在  PostgreSQL
关注(0)|答案(2)|浏览(117)

我有一个像这样的变化表

CREATE TABLE IF NOT EXISTS changes (
    entity_id TEXT NOT NULL,
    column_id TEXT NOT NULL,
    value JSONB NOT NULL,
    updated_at TIMESTAMP NOT NULL
);
INSERT INTO changes VALUES ('1', 'height', to_jsonb(140), '01-01-2021 00:00:00'::TIMESTAMP);
INSERT INTO changes VALUES ('1', 'weight', to_jsonb(30), '01-01-2021 00:00:00'::TIMESTAMP);
INSERT INTO changes VALUES ('1', 'height', to_jsonb(145), '01-02-2021 00:00:00'::TIMESTAMP);
INSERT INTO changes VALUES ('1', 'weight', to_jsonb(34),'01-03-2021 00:00:00'::TIMESTAMP);

 entity_id | column_id | value |     updated_at      
-----------+-----------+-------+---------------------
 1         | height    | 140   | 2021-01-01 00:00:00
 1         | weight    | 30    | 2021-01-01 00:00:00
 1         | height    | 145   | 2021-01-02 00:00:00
 1         | weight    | 34    | 2021-01-03 00:00:00

我想得到这个表的累积视图

entity_id  | height | weight |     updated_at      
-----------+--------+--------+---------------------
 1         | 140    | 30     | 2021-01-01 00:00:00
 1         | 145    | 30     | 2021-01-02 00:00:00
 1         | 145    | 34     | 2021-01-03 00:00:00

我当前的查询看起来有效

SELECT
  entity_id,
  coalesce(change->'height', lag(change->'height', 1, null) over (partition by entity_id order by updated_at)) as height,
  coalesce(change->'weight', lag(change->'weight', 1, null) over (partition by entity_id order by updated_at)) as weight,
  updated_at
FROM (
    SELECT entity_id, json_object_agg(column_id, value) as change, updated_at FROM changes
    GROUP BY entity_id, updated_at
) as changes;

但是我不喜欢这里的json_object_agg,我确信有一种方法可以在没有冗余聚合的情况下实现它?我错过的一些使用窗口聚合函数的方法。
UPD. @SelVazi帮助使查询更好,但我觉得这不是最终的解决方案。

with cte as (
  SELECT
    entity_id,
    max(case when column_id = 'height' then value::int end) as height,
    max(case when column_id = 'weight' then value::int end) as weight,
    updated_at
  from changes
  GROUP by entity_id, updated_at
)
select
  entity_id,
  coalesce(height, lag(height) over (partition by entity_id order by updated_at)) as height,
  coalesce(weight, lag(weight) over (partition by entity_id order by updated_at)) as weight,
  updated_at
from cte;
2lpgd968

2lpgd9681#

这比看起来要复杂得多。可以使用条件聚合来将高度和重量旋转到列,但之后我们必须填充“缺失”的值。
我假设两个度量中的任何一个都可能存在多个日期的间隔,这使得lag()在Postgres中是错误的,因为它只能回顾预定义的行数(并且不能忽略null值)。
我们可以通过在示例数据的末尾添加一行来演示lag()的问题:
| 实体标识|列ID|价值|更新于|
| --------------|--------------|--------------|--------------|
| 1|高度|一百四十|2019 -01- 21 00:00:00|
| 1|重量|三十|2019 -01- 21 00:00:00|
| 1|高度|一百四十五|2019 -02-01 00:00:00|
| 1|重量|34人|2021-03-01 00:00:00|
| 1|重量|一百四十|2021-04-01 00:00:00|
一种解决方法是使用间隙和孤岛技术将“缺失”值放入以非null值开始的组中,然后该值成为新值。

select entity_id, updated_at,
    max(height) over(partition by entity_id, grp_height) height,
    max(weight) over(partition by entity_id, grp_weight) weight
from (
    select c.*,
        count(height) over (partition by entity_id order by updated_at) grp_height,
        count(weight) over (partition by entity_id order by updated_at) grp_weight
    from (
        select entity_id, updated_at,
            max(value::text) filter(where column_id = 'height') height,
            max(value::text) filter(where column_id = 'weight') weight
        from changes
        group by entity_id, updated_at
    ) c
) c
order by entity_id, updated_at

fiddle
| 实体标识|更新于|高度|重量|
| --------------|--------------|--------------|--------------|
| 1| 2019 -01- 21 00:00:00|一百四十|三十|
| 1| 2019 -02-01 00:00:00|一百四十五|三十|
| 1| 2021-03-01 00:00:00|一百四十五|34人|
| 1| 2021-04-01 00:00:00|一百四十五|一百四十|

2cmtqfgy

2cmtqfgy2#

不使用json_object_agg,你可以使用条件聚合来获取列的高度和宽度:

SELECT entity_id, MAX(CASE WHEN column_id = 'height' THEN value::text END) AS height
                , MAX(CASE WHEN column_id = 'weight' THEN value::text END) AS weight
                , updated_at
FROM changes
GROUP BY entity_id, updated_at

最后一个查询:

SELECT
  entity_id,
  coalesce(height, lag(height, 1, null) over (partition by entity_id order by updated_at)) as height,
  coalesce(weight, lag(weight, 1, null) over (partition by entity_id order by updated_at)) as weight,
  updated_at
FROM (
    SELECT entity_id, MAX(CASE WHEN column_id = 'height' THEN value::text END) AS height
                , MAX(CASE WHEN column_id = 'weight' THEN value::text END) AS weight
                , updated_at
    FROM changes
    GROUP BY entity_id, updated_at
) as changes;

结果:

entity_id   height  weight  updated_at
1           140     30      2021-01-01 00:00:00
1           145     30      2021-02-01 00:00:00
1           145     34      2021-03-01 00:00:00

相关问题