查找第一个“连续第x天”

bqjvbblv  于 2021-07-26  发布在  Java
关注(0)|答案(2)|浏览(383)

根据以下数据,

SELECT setseed(0.5);

WITH stuff AS (
    SELECT d::date, floor(random() * 5) AS v
    FROM generate_series('2021-01-01'::date, '2021-01-15'::date, '1 day'::interval) t(d)
)
SELECT d, v
FROM stuff
WHERE extract(isodow from d) BETWEEN 1 AND 5;

更具体地说,

d      | v
------------+---
 2021-01-01 | 1  -- 1st consecutive day with a positive `v`
 2021-01-04 | 1  -- 2nd consecutive day with a positive `v`
 2021-01-05 | 0  -- 0th consecutive day with a positive `v`
 2021-01-06 | 0  -- 0th consecutive day with a positive `v`
 2021-01-07 | 0  -- 0th consecutive day with a positive `v`
 2021-01-08 | 1  -- 1st consecutive day with a positive `v`
 2021-01-11 | 0  -- 0th consecutive day with a positive `v`
 2021-01-12 | 4  -- 1st consecutive day with a positive `v`
 2021-01-13 | 3  -- 2nd consecutive day with a positive `v`
 2021-01-14 | 1  -- 3rd consecutive day with a positive `v` (this!)
 2021-01-15 | 3  -- 4th consecutive day with a positive `v`
(11 rows)

我想找到第一个连续第三天积极的 v ". 在上面的例子中, 2021-01-12 通过 2021-01-14 都符合标准,所以预期答案是 2021-01-14 . 一 NULL 如果没有这一天,则应退还。
目前,我正在使用 pandas 将数据提取到python中并使用计数器计算答案,但是出于性能原因,我想切换到postgresql。一个明显的解决方案是使用递归cte,但我希望避免使用这种方法,而使用一些自定义聚合函数或pl/pgsql过程,因为该解决方案将成为更大查询的一部分,因此我必须尽量保持简单,以避免查询复杂性的爆炸。我是说,一个递归的cte在另一个递归的cte在一个 LATERAL ...

disbfnqx

disbfnqx1#

我使用过窗口函数,但是我不确定这种解决方案的性能:

SELECT setseed(0.5);

WITH stuff AS (
    SELECT d::date, floor(random() * 5) AS v
    FROM generate_series('2021-01-01'::date, '2021-01-15'::date, '1 day'::interval) t(d)
), tmp as (
    SELECT d, v, 
    LAG(v) OVER (ORDER BY d) AS v2, 
    LAG(v, 2) OVER (ORDER BY d) AS v3
    FROM stuff
    WHERE extract(isodow from d) BETWEEN 1 AND 5
)
SELECT d
FROM tmp
WHERE v > 0 and v2 > 0 AND v3 > 0
LIMIT 1;
unftdfkk

unftdfkk2#

事实证明,您可以在不知道pl/pgsql的情况下编写自定义聚合,下面是我所做的。一般来说,应该是这样 retval_consecutive[2] = x - 1 第一个“连续第x天”。

CREATE OR REPLACE FUNCTION first_xth_consecutive_label_transfn(retval_consecutive int[2], label_cond int[2])
    RETURNS int[2]
    LANGUAGE sql
    IMMUTABLE AS
$$
SELECT CASE
           WHEN retval_consecutive[1] IS NOT NULL THEN
               retval_consecutive -- already found
           WHEN label_cond[2] = 1 THEN
               ARRAY [CASE WHEN retval_consecutive[2] = 2 THEN label_cond[1] END, retval_consecutive[2] + 1]
           ELSE
               ARRAY [NULL, 0]
           END
$$;

CREATE OR REPLACE FUNCTION first_xth_consecutive_label_final(ans_consecutive int[2])
    RETURNS int
    LANGUAGE sql
    IMMUTABLE AS
$$
SELECT ans_consecutive[1];
$$;

DROP AGGREGATE IF EXISTS first_xth_consecutive_label(int[2]);
CREATE AGGREGATE first_xth_consecutive_label(int[2]) (
    sfunc = first_xth_consecutive_label_transfn,
    stype = int[2],
    finalfunc = first_xth_consecutive_label_final,
    initcond = '{NULL, 0}'
    );

用法:

SELECT setseed(0.5);

WITH stuff AS (
    SELECT d::date, floor(random() * 5) AS v
    FROM generate_series('2021-01-01'::date, '2021-01-15'::date, '1 day'::interval) t(d)
)
SELECT to_timestamp(first_xth_consecutive_label(ARRAY [extract(epoch FROM d)::int, CASE WHEN v > 0 THEN 1 ELSE 0 END]))::date
FROM stuff
WHERE extract(isodow from d) BETWEEN 1 AND 5;

相关问题