SQL Server How to get date ranges over equal consecutive values having a single date

wlp8pajw  于 2023-03-07  发布在  其他
关注(0)|答案(1)|浏览(126)

I have a table like this:

CREATE TABLE Rates
(
   RateGroup int NOT NULL,
   Rate decimal(5, 2) NOT NULL,
   DueDate date NOT NULL
);

This table contains rates which are valid from a certain due date to the day before the next due date. If no next due date is present, the rate's validity has no end. There can be multiple consecutive due days with the same rate and a certain can appear on different non consecutive due days as well.

The rates are divided into groups. A single due date can appear in multiple groups but only once per group.

Here's some example data:

INSERT INTO Rates(RateGroup, Rate, DueDate)
VALUES
      (1, 1.2, '20210101'), (1, 1.2, '20210215'), (1, 1.5, '20210216'),
      (1, 1.2, '20210501'), (2, 3.7, '20210101'), (2, 3.7, '20210215'),
      (2, 3.7, '20210216'), (2, 3.7, '20210501'), (3, 2.9, '20210101'),
      (3, 2.5, '20210215'), (3, 2.5, '20210216'), (3, 2.1, '20210501');
RateGroupRateDueDate
11.202021-01-01
11.202021-02-15
11.502021-02-16
11.202021-05-01
23.702021-01-01
23.702021-02-15
23.702021-02-16
23.702021-05-01
32.902021-01-01
32.502021-02-15
32.502021-02-16
32.102021-05-01

Now I want a query which folds multiple consecutive rows of a rate group with the same rate to a single row containing the date range (start and end date) where the rate is valid.

This is the desired result:
| RateGroup | Rate | StartDate | EndDate |
| ------------ | ------------ | ------------ | ------------ |
| 1 | 1.20 | 2021-01-01 | 2021-02-15 |
| 1 | 1.50 | 2021-02-16 | 2021-04-30 |
| 1 | 1.20 | 2021-05-01 | NULL |
| 2 | 3.70 | 2021-01-01 | NULL |
| 3 | 2.90 | 2021-01-01 | 2021-02-14 |
| 3 | 2.50 | 2021-02-15 | 2021-04-30 |
| 3 | 2.10 | 2021-05-01 | NULL |

How can I achieve this?

qxsslcnc

qxsslcnc1#

This can be done with Common Table Expressions uilizing the OVER Clause as in the following query:

WITH
    RatesWithBegin AS
        (
            SELECT RateGroup, Rate, DueDate,
                    CASE
                        WHEN Rate = LAG(Rate) OVER (PARTITION BY RateGroup ORDER BY DueDate)
                            THEN 0
                        ELSE 1
                    END AS IsBegin
                FROM Rates
        ),
    RatesFromTo AS
        (
            SELECT RateGroup, Rate, DueDate AS StartDate,
                    LEAD (DATEADD(day, -1, DueDate)) OVER
                        (
                            PARTITION BY RateGroup
                            ORDER BY DueDate
                        ) AS EndDate,
                    SUM (IsBegin) OVER
                        (
                            PARTITION BY RateGroup
                            ORDER BY DueDate
                            ROWS UNBOUNDED PRECEDING
                        ) AS RangeID
                FROM RatesWithBegin
        )
    SELECT RateGroup, MAX(Rate) AS Rate, MIN(StartDate) AS StartDate,
            NULLIF(MAX(COALESCE(EndDate, '99990101')), '99990101') AS EndDate
        FROM RatesFromTo
        GROUP BY RateGroup, RangeID
        ORDER BY RateGroup, StartDate;

How does it work?

RatesWithBegin AS
    (
        SELECT RateGroup, Rate, DueDate,
                CASE
                    WHEN Rate = LAG(Rate) OVER (PARTITION BY RateGroup ORDER BY DueDate)
                        THEN 0
                    ELSE 1
                END AS IsBegin
            FROM Rates
    ),

Here we are using LAG() to compare the current rate to it's predecessor. PARTITION BY RateGroup makes sure that we don't mix rate groups and ORDER BY DueDate determines the order at which we look at the rows.

If the current rate is equal to it's predecessor we mark the row with a 0 otherwise with a 1. The result of this CTE would look like so for the first rate group:
| RateGroup | Rate | DueDate | IsBegin |
| ------------ | ------------ | ------------ | ------------ |
| 1 | 1.20 | 2021-01-01 | 1 |
| 1 | 1.20 | 2021-02-15 | 0 |
| 1 | 1.50 | 2021-02-16 | 1 |
| 1 | 1.20 | 2021-05-01 | 1 |

0 and 1 are no arbitrary values; they are needed for the next step.

RatesFromTo AS
    (
        SELECT RateGroup, Rate, DueDate AS StartDate,
                LEAD (DATEADD(day, -1, DueDate)) OVER
                    (
                        PARTITION BY RateGroup
                        ORDER BY DueDate
                    ) AS EndDate,
                SUM (IsBegin) OVER
                    (
                        PARTITION BY RateGroup
                        ORDER BY DueDate
                        ROWS UNBOUNDED PRECEDING
                    ) AS RangeID
            FROM RatesWithBegin
    )

In this CTE we build a running total over the IsBegin column with SUM() . Due to the value being 1 at the start of a new range and 0 within a range our running total increments always at the begin of a new range. This leads to an uniqe number for each range.

With LEAD() we add the day before the next due date in our range to the output. The result of this step for the first rate group is then:
| RateGroup | Rate | StartDate | EndDate | RangeID |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | 1.20 | 2021-01-01 | 2021-02-14 | 1 |
| 1 | 1.20 | 2021-02-15 | 2021-02-15 | 1 |
| 1 | 1.50 | 2021-02-16 | 2021-04-30 | 2 |
| 1 | 1.20 | 2021-05-01 | NULL | 3 |

SELECT RateGroup, MAX(Rate) AS Rate, MIN(StartDate) AS StartDate,
        NULLIF(MAX(COALESCE(EndDate, '99990101')), '99990101') AS EndDate
    FROM RatesFromTo
    GROUP BY RateGroup, RangeID
    ORDER BY RateGroup, StartDate;

Now that we have an unique identifier ( RangeID ) for the date ranges we can do a simple aggregation with GROUP BY to get our desired result. Since the ranges can be open ended (no next due date) we use

NULLIF(MAX(COALESCE(EndDate, '99990101')), '99990101')

to make sure that NULL is always treated as the latest possible date.

相关问题