postgresql 附加组的时间刻度插值平均值

cyvaqqii  于 2023-01-17  发布在  PostgreSQL
关注(0)|答案(1)|浏览(122)

有没有办法在toolkit_experimental.interpolated_average函数中添加一个额外的group by?假设我的数据包含不同传感器的功率测量值;如何在sensor_id上添加group by?

with s as (
  select sensor_id,
    time_bucket('30 minutes', timestamp) bucket,
    time_weight('LOCF', timestamp, value) agg
  from
    measurements m
    inner join sensor_definition sd on m.sensor_id = sd.id
  where asset_id = '<battery_id>' and sensor_name = 'power' and
    timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
  group by sensor_id, bucket)
select sensor_id,
  bucket,
  toolkit_experimental.interpolated_average(
      agg,
      bucket,
      '30 minutes'::interval,
      lag(agg) over (order by bucket),
      lead(agg) over (order by bucket)
    )
from s
group by sensor_id;

上面的查询不起作用,因为我还需要添加bucketagg作为group by column。
您可以在下面找到相关的模式。

create table measurements
(
    sensor_id uuid                     not null,
    timestamp timestamp with time zone not null,
    value     double precision         not null
);

create table sensor_definition
(
    id          uuid default uuid_generate_v4() not null
        primary key,
    asset_id    uuid                            not null,
    sensor_name varchar(256)                    not null,
    sensor_type varchar(256)                    not null,
    unique (asset_id, sensor_name, sensor_type)
);

有什么建议吗?

r8uurelv

r8uurelv1#

这是一个很好的问题和很酷的用例。肯定有办法做到这一点!我喜欢你的CTE在顶部,虽然我更喜欢用更有描述性的方式来命名它们。连接看起来很适合选择,您甚至可以很容易地在将来的某个时候为continuous aggregate子发送“动态”聚合,然后对连续聚合执行相同的连接......所以这很棒!
您唯一需要做的事情就是修改leadlag函数的window子句,使它们明白它不是在处理完全有序的数据集,这样您就根本不需要group by子句了!

WITH weighted_sensor AS (
  SELECT 
    sensor_id,
    time_bucket('30 minutes', timestamp) bucket,
    time_weight('LOCF', timestamp, value) agg 
  FROM
    measurements m
    INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
  WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
    timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
  GROUP BY sensor_id, bucket)
SELECT 
  sensor_id,
  bucket,
  toolkit_experimental.interpolated_average(
      agg,
      bucket,
      '30 minutes'::interval,
      lag(agg) OVER (PARTITION BY sensor_id ORDER BY bucket),
      lead(agg) OVER (PARTITION BY sensor_id ORDER BY bucket)
    )
FROM weighted_sensor;

你也可以在查询中将window子句拆分成一个单独的子句并命名它,这在你多次使用它的情况下尤其有用,所以如果你也要使用积分函数,例如,要获得一段时间内的总能源利用率,你可以这样做:

WITH weighted_sensor AS (
  SELECT 
    sensor_id,
    time_bucket('30 minutes', timestamp) bucket,
    time_weight('LOCF', timestamp, value) agg 
  FROM
    measurements m
    INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
  WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
    timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
  GROUP BY sensor_id, bucket)
SELECT 
  sensor_id,
  bucket,
  toolkit_experimental.interpolated_average(
      agg,
      bucket,
      '30 minutes'::interval,
      lag(agg) OVER sensor_times,
      lead(agg) OVER sensor_times
    ),
toolkit_experimental.interpolated_integral(
      agg,
      bucket,
      '30 minutes'::interval,
      lag(agg) OVER sensor_times,
      lead(agg) OVER sensor_times,
     'hours'
    )
FROM weighted_sensor
WINDOW sensor_times AS (PARTITION BY sensor_id ORDER BY bucket);

我用小时作为单位,因为我认为能量通常是用瓦特小时或类似的单位来衡量的...

相关问题