根据以下数据,
SELECT setseed(0.5);
WITH stuff AS (
SELECT d::date, floor(random() * 5) AS v
FROM generate_series('2021-01-01'::date, '2021-01-15'::date, '1 day'::interval) t(d)
)
SELECT d, v
FROM stuff
WHERE extract(isodow from d) BETWEEN 1 AND 5;
更具体地说,
d | v
------------+---
2021-01-01 | 1 -- 1st consecutive day with a positive `v`
2021-01-04 | 1 -- 2nd consecutive day with a positive `v`
2021-01-05 | 0 -- 0th consecutive day with a positive `v`
2021-01-06 | 0 -- 0th consecutive day with a positive `v`
2021-01-07 | 0 -- 0th consecutive day with a positive `v`
2021-01-08 | 1 -- 1st consecutive day with a positive `v`
2021-01-11 | 0 -- 0th consecutive day with a positive `v`
2021-01-12 | 4 -- 1st consecutive day with a positive `v`
2021-01-13 | 3 -- 2nd consecutive day with a positive `v`
2021-01-14 | 1 -- 3rd consecutive day with a positive `v` (this!)
2021-01-15 | 3 -- 4th consecutive day with a positive `v`
(11 rows)
我想找到第一个连续第三天积极的 v
". 在上面的例子中, 2021-01-12
通过 2021-01-14
都符合标准,所以预期答案是 2021-01-14
. 一 NULL
如果没有这一天,则应退还。
目前,我正在使用 pandas
将数据提取到python中并使用计数器计算答案,但是出于性能原因,我想切换到postgresql。一个明显的解决方案是使用递归cte,但我希望避免使用这种方法,而使用一些自定义聚合函数或pl/pgsql过程,因为该解决方案将成为更大查询的一部分,因此我必须尽量保持简单,以避免查询复杂性的爆炸。我是说,一个递归的cte在另一个递归的cte在一个 LATERAL
...
2条答案
按热度按时间disbfnqx1#
我使用过窗口函数,但是我不确定这种解决方案的性能:
unftdfkk2#
事实证明,您可以在不知道pl/pgsql的情况下编写自定义聚合,下面是我所做的。一般来说,应该是这样
retval_consecutive[2] = x - 1
第一个“连续第x天”。用法: