oracle 重叠和不重叠的间隔变成连续的间隔

6qfn3psc  于 2023-11-17  发布在  Oracle
关注(0)|答案(4)|浏览(180)

我是DB世界的初学者,我正在使用ORACLE DB,有一个具体的问题,我在网上找不到任何充分的解释。
我有一个表,它有列(VALID_FROM(date)VALID_UNTIL(date)和valid(number))VALID FROM是强制性的,而VALID_UNTIL是可空的(比如说无穷大),我想获取所有的间隔(它们可以重叠,也不需要重叠),并将它们变成各自的非重叠间隔(并将非重叠间隔中的所有计数相加)
这是正确的输入和输出的例子:输入:
| 有效期为|有效期至|计数|
| --|--|--|
| 2022年01月01日|2022年10月1日星期一| 11 |
| 2023年01月01日|NULL| 1 |
| 2023年10月1日星期一|2023年10月19日| 2 |
| 2023年10月03日|2023年10月12日| 5 |
输出量:
| 有效期为|有效期至|计数|
| --|--|--|
| 2022年01月01日|2022年10月1日星期一| 11 |
| 1.1.23| 30.9.23| 1 |
| 1.10.23| 2.10.23| 3 |
| 3.10.23| 12.10.23| 8 |
| 13.10.23| 19.10.23| 3 |
| 20.10.23| NULL| 1 |
什么是最有效的查询来实现这一点,这是迄今为止最困难的事情,我试图学习。时期是噩梦。谢谢
我试过递归,cte,没有得到任何结果

t98cgbkg

t98cgbkg1#

您可以在不需要行生成器、自联接或聚合的情况下完成此操作。
将日期反透视,以便将它们放在一列中,然后使用分析函数生成计数的运行总数,当您到达“from”日期时,将值相加,当您到达“until”日期时,将值相减。然后您可以使用LEAD分析函数查找下一个日期,如果新的valid_from是一个“直到”值或者如果valid_until是一个“从”值,则它将成为范围的valid_until值,并按一天进行调整:

SELECT *
FROM   (
  SELECT dt AS valid_from,
         LEAD(dt) OVER (ORDER BY dt)
           - INTERVAL '1' SECOND AS valid_until,
         SUM(count*type) OVER (ORDER BY dt) AS count
  FROM   (
    SELECT valid_from,
           valid_until + INTERVAL '1' DAY AS valid_until,
           count
    FROM   table_name
  )
  UNPIVOT INCLUDE NULLS (
    dt FOR type IN (
      valid_from  AS 1,
      valid_until AS -1
    )
  )
)
WHERE  count > 0
AND    (valid_from <= valid_until OR valid_until IS NULL);

字符串
其中,对于样本数据:

CREATE TABLE table_name (VALID_FROM, VALID_UNTIL, COUNT) AS
SELECT DATE '2022-01-01', DATE '2022-10-01', 11 FROM DUAL UNION ALL
SELECT DATE '2022-10-01', DATE '2022-10-02', 11 FROM DUAL UNION ALL
SELECT DATE '2022-10-03', DATE '2022-10-04', 11 FROM DUAL UNION ALL
SELECT DATE '2023-01-01', NULL,               1 FROM DUAL UNION ALL
SELECT DATE '2023-10-01', DATE '2023-10-19',  2 FROM DUAL UNION ALL
SELECT DATE '2023-10-03', DATE '2023-10-12',  5 FROM DUAL;


产出:
| VALID_FROM|有效期至|计数|
| --|--|--|
| 2022-01-01 00:00:00| 2022-09-30 23:59:59| 11 |
| 2019 -10-01 00:00:00| 2022-10-01 23:59:59| 22 |
| 2019 -02 - 22 00:00:00| 2022-10-02 23:59:59| 11 |
| 2019 - 09 - 22 00:00:00| 2022-10-04 23:59:59| 11 |
| 2023-01-01 00:00:00| 2023-09-30 23:59:59| 1 |
| 2023-10-01 00:00:00| 2023-10-02 23:59:59| 3 |
| 2019 -10-03 00:00:00| 2023-10-12 23:59:59| 8 |
| 2023-10-13 00:00:00| 2023-10-19 23:59:59| 3 |
| 2019 -10-20 00:00:00| * 空 *| 1 |

  • 注意:如果你只想要每天的开始,请将SELECT *更改为SELECT valid_from, TRUNC(valid_until) AS valid_until, count

如果你想合并具有相同计数的连续范围,那么你可以将输出传递给MATCH_RECOGNIZE来执行逐行模式匹配:

SELECT *
FROM   (
  SELECT dt AS valid_from,
         LEAD(dt) OVER (ORDER BY dt)
           - INTERVAL '1' SECOND AS valid_until,
         SUM(count*type) OVER (ORDER BY dt) AS count
  FROM   (
    SELECT valid_from,
           valid_until + INTERVAL '1' DAY AS valid_until,
           count
    FROM   table_name
  )
  UNPIVOT INCLUDE NULLS (
    dt FOR type IN (
      valid_from  AS 1,
      valid_until AS -1
    )
  )
)
MATCH_RECOGNIZE(
  ORDER BY valid_from
  MEASURES
    FIRST(valid_from) AS valid_from,
    LAST(valid_until) AS valid_until,
    FIRST(count) AS count
  PATTERN (first_row same*)
  DEFINE
    same AS  FIRST(count) = count
         AND PREV(valid_until) + INTERVAL '1' SECOND = valid_from
)
WHERE  count > 0;


产出:
| VALID_FROM|有效期至|计数|
| --|--|--|
| 2022-01-01 00:00:00| 2022-09-30 23:59:59| 11 |
| 2019 -10-01 00:00:00| 2022-10-01 23:59:59| 22 |
| 2019 -02 - 22 00:00:00| 2022-10-04 23:59:59| 11 |
| 2023-01-01 00:00:00| 2023-09-30 23:59:59| 1 |
| 2023-10-01 00:00:00| 2023-10-02 23:59:59| 3 |
| 2019 -10-03 00:00:00| 2023-10-12 23:59:59| 8 |
| 2023-10-13 00:00:00| 2023-10-19 23:59:59| 3 |
| 2019 -10-20 00:00:00| * 空 *| 1 |
fiddle

oymdgrw7

oymdgrw72#

这可能不是最有效的方法-你是对的,重叠的时间段很难!-但它应该工作:

WITH your_table AS (SELECT to_date('01/01/2022', 'dd/mm/yyyy') valid_from, to_date('01/10/2022', 'dd/mm/yyyy') valid_to, 11 cnt FROM dual UNION ALL
                    SELECT to_date('01/01/2023', 'dd/mm/yyyy') valid_from, NULL valid_to, 1 cnt FROM dual UNION ALL
                    SELECT to_date('01/10/2023', 'dd/mm/yyyy') valid_from, to_date('19/10/2023', 'dd/mm/yyyy') valid_to, 2 cnt FROM dual UNION ALL
                    SELECT to_date('03/10/2023', 'dd/mm/yyyy') valid_from, to_date('12/10/2023', 'dd/mm/yyyy') valid_to, 5 cnt FROM dual),
  list_of_dates AS (SELECT min_date + LEVEL - 1 dt, max_date, null_valid_to_present
                    FROM   (SELECT MIN(valid_from) min_date, GREATEST(MAX(valid_from), MAX(valid_to)) + 1 max_date, MAX(CASE WHEN valid_to IS NULL THEN 'Y' END) null_valid_to_present
                            FROM   your_table)
                    CONNECT BY LEVEL <= (max_date - min_date) + 1),
       grp_info AS (SELECT dts.dt,
                           dts.max_date,
                           dts.null_valid_to_present,
                           SUM(yt.cnt) cnt,
                           dt - row_number() OVER (PARTITION BY SUM(yt.cnt) ORDER BY dt) grp
                    FROM   list_of_dates dts
                           INNER JOIN your_table yt ON dts.dt >= yt.valid_from
                                                            AND dts.dt <= COALESCE(yt.valid_to, dts.dt + 1)
                    GROUP BY dts.dt,
                             dts.max_date,
                             dts.null_valid_to_present)
SELECT MIN(dt) valid_from,
       CASE WHEN null_valid_to_present = 'Y' AND MIN(dt) = max_date THEN NULL ELSE MAX(dt) END valid_to,
       cnt
FROM   grp_info
GROUP BY cnt, grp, max_date, null_valid_to_present
ORDER BY MIN(dt);

VALID_FROM  VALID_TO           CNT
----------- ----------- ----------
01/01/2022  01/10/2022          11
01/01/2023  30/09/2023           1
01/10/2023  02/10/2023           3
03/10/2023  12/10/2023           8
13/10/2023  19/10/2023           3
20/10/2023                       1

字符串
首先,我们计算出数据集中最低可用valid_from日期(假设valid_to日期永远不会早于valid_from日期)和最高日期之间的日期列表,我们将其加1(以说明valid_to为空的行),然后我们将原始数据集内部连接到这个日期列表。
接下来,我们使用一种名为tabibitosan的技术来为每一行生成一个组标识符。这是通过从行的对应dt中减去按dt排序的cnt组内的行号来实现的。连续且具有相同cnt值的行将以相同的差值结束,否则差值将改变。
最后,我们使用组标识符对最终输出进行分组,如果原始数据集中有NULL valid_to date,则沿着一些小技巧输出NULL。

huwehgph

huwehgph3#

这可能也不是非常有效;它多次命中表,但不会扩展日期范围,因此可能比@boneist的方法更好或更差。
首先,根据重叠的截止日期中的真实的行和隐含的行,计算出所有的期间开始日期:

-- real rows
select valid_from, valid_until
from your_table
union all
-- implied rows based on previous valid_until which overlaps
select valid_until + 1, valid_until
from your_table t1
where valid_until is not null
and exists
(
  select null from your_table t2
  where t2.id != t1.id
  and t2.valid_from <= t1.valid_from
  and t2.valid_until >= t1.valid_until
)
union all
-- implied row to replace open-ended row, if there is one
select valid_until + 1, null
from your_table t1
where valid_until is not null
and exists
(
  select null
  from your_table t2
  where t2.valid_from < t1.valid_from
  and t2.valid_until is null
)
and not exists
(
  select null
  from your_table t2
  where t2.valid_until > t1.valid_until
)
order by valid_from, valid_until

字符串
| VALID_FROM|有效期至|
| --|--|
| 2022-01-01 2022-01-01| 2022-10-01 2022-10-01|
| 2023年1月1日| * 空 *|
| 2023-10-01 2023-10-01| 2023年10月19日|
| 2023-10-03 - 01| 2023年10月12日|
| 2023-10-13 - 01 - 01 - 01| 2023年10月12日|
| 2023年10月20日| * 空 *|
我在你的表中添加了一个ID列,以避免行在exist子句中匹配自己。如果你没有唯一的标识符,那么你可以使用valid_fromvalid_to,只要它们的组合是唯一的。
然后根据周围的行,使用前面的查询作为CTE,计算出每个起始日期的有效截止日期:

with cte1 (valid_from, valid_until) as
(
  -- previous query, omitted for brevity
)
select valid_from,
  case when valid_until is null
         or lead(valid_from) over (order by valid_from)
              <= max(valid_until) over (order by valid_from) + 1
       then lead(valid_from) over (order by valid_from) - 1
       else valid_until
  end as valid_until
from cte1
order by valid_from, valid_until


| VALID_FROM|有效期至|
| --|--|
| 2022-01-01 2022-01-01| 2022-10-01 2022-10-01|
| 2023年1月1日|2023-09-30 -09- 09 - 09|
| 2023-10-01 2023-10-01| 2023-10-02 2023-10-02|
| 2023-10-03 - 01| 2023年10月12日|
| 2023-10-13 - 01 - 01 - 01| 2023年10月19日|
| 2023年10月20日| * 空 *|
然后使用它作为另一个CTE,并连接回真实的数据,以获取匹配计数并求和:

with cte1 (valid_from, valid_until) as
(
  -- previous query, omitted for brevity
),
cte2 (valid_from, valid_until) as
(
  -- previous query, omitted for brevity
)
select cte2.valid_from, cte2.valid_until, sum(t.cnt) as cnt
from cte2
join your_table t
on t.valid_from <= cte2.valid_from
and (t.valid_until is null or t.valid_until >= cte2.valid_until)
group by cte2.valid_from, cte2.valid_until
order by cte2.valid_from, cte2.valid_until


或全文:

with cte1 (valid_from, valid_until) as
(
  select valid_from, valid_until
  from your_table
  union all
  select valid_until + 1, valid_until
  from your_table t1
  where valid_until is not null
  and exists
  (
    select null
    from your_table t2 
    where t2.id != t1.id
    and t2.valid_from <= t1.valid_from
    and t2.valid_until >= t1.valid_until
  )
  union all
  select valid_until + 1, null
  from your_table t1
  where valid_until is not null
  and exists
  (
    select null
    from your_table t2
    where t2.valid_from < t1.valid_from
    and t2.valid_until is null
  )
  and not exists
  (
    select null
    from your_table t2
    where t2.valid_until > t1.valid_until
  )
),
cte2 (valid_from, valid_until) as
(
  select valid_from,
    case when valid_until is null
           or lead(valid_from) over (order by valid_from)
                <= max(valid_until) over (order by valid_from) + 1
         then lead(valid_from) over (order by valid_from) - 1
         else valid_until
    end as valid_until
  from cte1
)
select cte2.valid_from, cte2.valid_until, sum(t.cnt) as cnt
from cte2
join your_table t
on t.valid_from <= cte2.valid_from
and (t.valid_until is null or t.valid_until >= cte2.valid_until)
group by cte2.valid_from, cte2.valid_until
order by cte2.valid_from, cte2.valid_until


| VALID_FROM|有效期至|CNT|
| --|--|--|
| 2022-01-01 2022-01-01| 2022-10-01 2022-10-01| 11 |
| 2023年1月1日|2023-09-30 -09- 09 - 09| 1 |
| 2023-10-01 2023-10-01| 2023-10-02 2023-10-02| 3 |
| 2023-10-03 - 01| 2023年10月12日| 8 |
| 2023-10-13 - 01 - 01 - 01| 2023年10月19日| 3 |
| 2023年10月20日| * 空 *| 1 |
fiddle带有一些额外的查询,显示了一些工作。

tpgth1q7

tpgth1q74#

如果你准备好可能的间隔列表并测试交叉点,我可能会更容易:

WITH data(id, valid_from, valid_until, cnt) as (
        select 1, to_date('01/01/2022', 'dd/mm/yyyy'), to_date('01/10/2022', 'dd/mm/yyyy'), 11 from dual union all
        select 2, to_date('01/01/2023', 'dd/mm/yyyy'), NULL, 1 from dual union all
        select 3, to_date('01/10/2023', 'dd/mm/yyyy'), to_date('19/10/2023', 'dd/mm/yyyy'), 2 from dual union all
        select 4, to_date('03/10/2023', 'dd/mm/yyyy'), to_date('12/10/2023', 'dd/mm/yyyy'), 5 from dual -- union all
    ),
    allintervals as (
        select * from (
            select dat as date_from, lead(dat) over(order by dat, pos) as date_until from 
            (
                select 2 - mod(row_number() over(order by dat),2) as pos, dat from 
                (
                    select valid_from as dat from data
                    union
                    select nvl(valid_until, to_date('31/12/2999','dd/mm/yyyy')) from data
                )
            )
        )
        where date_until is not null
    )
    select date_from, nullif(date_until,to_date('31/12/2999','dd/mm/yyyy')) as date_until, sum(cnt) as cnt 
    from data d
    join allintervals ai on 
        d.valid_from < ai.date_until 
        and
        nvl(d.valid_until, to_date('31/12/2999','dd/mm/yyyy')) > ai.date_from
    group by date_from, date_until
    ;

01/01/2022  01/10/2022  11
01/01/2023  01/10/2023  1
01/10/2023  03/10/2023  3
03/10/2023  12/10/2023  8
12/10/2023  19/10/2023  3
19/10/2023  null        1

字符串

相关问题