postgresql 使用平均值滚动数据

lrpiutwd  于 2023-08-04  发布在  PostgreSQL
关注(0)|答案(2)|浏览(169)

postgresql可以对滚动数据做平均吗?我有这样的情况,我需要平均5天的值,并在等待实际值的同时填满所有的日子,所以基本上如果我今天在第6天,我的公式将是第1-5天,然后第7天将是2-6天,但第6天的值来自今天的平均值,等等。您可以看到下面的值。我一直在搜索SQL是否可以做到这一点,但找不到任何功能。我只找到了recursive,但在postgres中不可用。

样本表

x1c 0d1x的数据

ajsxfq5m

ajsxfq5m1#

计算具有缺失值替换的滚动均值的关键是维护保存对均值有贡献的值的缓冲区。下面的查询演示了如何使用递归CTE和一个包含贡献值的数组来计算滚动均值:

WITH RECURSIVE
  -- generate sample data as per original post
  t AS (
    SELECT s.n AS day_number, (CASE WHEN s.n <= 5 THEN s.n + 1 END)::float AS day_value
      FROM generate_series(1, 9) s(n)),
  cte AS (
    SELECT t.day_number,
           t.day_value AS original_day_value,
           t.day_value,
           COALESCE(t.day_value, 0) AS rolling_sum,
           CASE WHEN t.day_value IS NULL THEN 0 ELSE 1 END AS sample_count,
           -- array size determines maximum number of samples contributing to rolling mean
           array_fill(NULL::float, ARRAY[4]) || ARRAY[t.day_value] AS samples
      FROM t
      WHERE t.day_number = 1
    UNION ALL
    SELECT t.day_number,
           t.day_value AS original_day_value,
           w.day_value,
           -- maintain rolling sum to eliminate need to repeatedly sum samples
           cte.rolling_sum - COALESCE(samples[cte.sample_count - cardinality(cte.samples) + 1], 0) + COALESCE(w.day_value, 0) AS rolling_sum,
           -- because missing values have substitutions, sample_count can never decrease
           LEAST(cardinality(cte.samples),
                 (cte.sample_count + CASE WHEN w.day_value IS NOT NULL THEN 1 ELSE 0 END)) AS sample_count,
           -- shift out the oldest sample and append the newest
           cte.samples[2:] || w.day_value
      FROM t
      JOIN cte
        ON t.day_number = cte.day_number + 1
      CROSS JOIN LATERAL (
        -- compute the new day value once instead of repeating the calculation in the enclosing query's select list
        SELECT COALESCE(t.day_value, cte.rolling_sum / NULLIF(cte.sample_count, 0)) AS day_value) w)
SELECT cte.*
  FROM cte
  ORDER BY cte.day_number;

字符串
我没有访问PostgreSQL 9.1数据库的权限,所以我只在PostgreSQL 15.3中运行了这个。可能需要对旧版本进行一些更改;然而,一般概念仍然适用。

s4n0splo

s4n0splo2#

您可以使用窗口版本的avg()函数获得所需的5天运行平均值,请参阅herehere with demo here

select day
     , value
     , case when rn>5
            then p5::numeric(5,2)
            else null
       end "prior 5 day average"
  from (select * 
             , avg(value) over win p5
             , row_number() over() rn
          from sample_table
        window win as (order by day 
                       range between 5 preceding
                                 and 1 preceding
                      )   
       ) sq;

字符串
好吧,至少你得到了你想要的,5天前的平均水平。然而,你不能得到你在sample table中所拥有的,因为第6天到第10天没有有效的值,我不打算告诉你什么值会导致你的姿势结果。
注:演示实际上使用v15运行。由于db<>fiddle不再支持v9.1,但文档参考是v9.1。

相关问题