SQL Server Split one record into multiple rows based on date range

qco9c6ql  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(88)

I want to create a list similar to the result. This should provide an overview of the net prices of a product. Always within the product's duration (start/end date).

A product can have one or more discounts. If there are multiple discounts, the percentages should be added together. So, in a period of 2x 5%, the total is 10%.

The discount may have a different start/end date than the product. The end date can be later than that of the product, but then the product's end date applies. The same goes for the start date.

The result table should fill in with an query. I'm having difficulty splitting the lines into lines per date range.

Example:

Product
product_id  start_date  end_date    gross_price
1           01-01-2023  01-01-2024  5
2           01-01-2023  01-01-2024  10
3           01-01-2023  01-01-2024  20


product_discount
product_id  product_discount_id percentage  start_date  end_date
1           1                   10          01-01-2023  01-01-2024
2           2                   10          01-01-2023  01-07-2023
3           3                   5           01-01-2023  01-01-2024  
3           4                   5           01-01-2023  01-07-2023


Result:
product_id  start_date  end_date    gross_price precentage  net_price
1           01-01-2023  01-01-2024  5           10          4.50

2           01-01-2023  30-06-2023  10          10          9.00
2           01-07-2023  01-01-2024  10          0           10.00

3           01-01-2023  30-06-2023  20          10          18.00
3           01-07-2023  01-01-2024  20          5           19.00

I've no idea where to start..

j2cgzkjk

j2cgzkjk1#

The simplest and most naive case is that...

  1. No discount overlaps with another discount
  2. No discount starts or ends outside of the product start and end
  3. One product can only have one start and end

Which means that...

  • There is no discount until the first discount
  • Every discount is followed by no discount, until the next discount starts
  • The last row ends when the product ends
WITH
  discount_lookahead AS
(
  SELECT
    d.*,
    LEAD(d.start_date) AS next_start_date
  FROM
    product_discount AS d
)

-- Create the discount and null following records

SELECT
  dl.product_id,
  t.product_discount_id,
  t.start_date,
  COALESCE(t.end_date, p.end_date),
  t.percentage
FROM
  product            AS p
INNER JOIN
  discount_lookahead AS dl
    ON dl.product_id = p.product_id
CROSS APPLY
(
  SELECT
    dl.product_discount_id,
    dl.start_date,
    dl.end_date,
    dl.percentage

  UNION ALL

  SELECT
    NULL,
    dl.end_date,
    dl.next_start_date,
    NULL
) 
  AS t

UNION ALL

-- create the initial records

SELECT
  p.product_id,
  NULL,
  p.start_date,
  COALESCE(d.start_date, p.end_date),
  NULL
FROM
  product            AS p
OUTER APPLY
(
  SELECT TOP (1) start_date
    FROM product_discount
   WHERE product_id = p.product_id
ORDER BY start_date
)
  AS d

ORDER BY
  product_id,
  start_date
biswetbf

biswetbf2#

This can be done by:

  1. Extracting all unique start and end dates from both the products and product-discounts tables.
  2. Sorting the dates and defining a series of adjacent, non-overlapping date ranges covering all of the dates of interest.
  3. Join the generated date ranges with both the products and product-discounts, filtering to only include overlapping date-ranges.
  4. Sum up the discounts and calculate the results.
  5. Merge adjacent rows with the same discount and where the end-date of prior row equals the start date of the next row.

The above logic is performed independently for each product_id value.

This solution assumes exclusive end dates, where a product offering or discount ends at 00:00:00 on the given date - effectively ending at roughly 23:59:59.999999 the day before. The end of one interval exactly equals the start of the next interval. This works well with date range comparisons and avoids having to do off-by-one adjustments during the calculations.

A standard comparison for overlapping intervals is A.start < B.end AND B.start < A.end . This nicely handles all variations of partially or fully overlapping intervals.

(Normally with date-range overlaps, it is necessary to calculate the overlap portion with GREATEST(A.start, B.start) AS overlap_start and LEAST(A.end, B.end) AS overlap_end . However, the way the date ranges have been defined here, they would never be further divided, so their start/end dates can be used directly.)

For the final merge, the logic uses a nested query that flags each rows as being either the same (0) or different (1) from the prior row. It then calculates a running sum of these values, which becomes the key GROUP BY value use to merge similar rows. The combined start and end dates are then just the min(start) and max(end) of the combined rows.

The code below uses several Common Table Expressions (CTEs) to build up the list of dates and date intervals before feeding those intermediate results into the final query.

WITH CTE_ProductDates AS (
    -- Gather all dates for each poroduct_id
    -- The UNION will eliminate duplicates
    SELECT product_id, start_date AS date FROM Product
    UNION
    SELECT product_id, end_date AS date FROM Product
    UNION
    SELECT product_id, start_date AS date FROM product_discount
    UNION
    SELECT product_id, end_date AS date FROM product_discount
),
CTE_ProductDateRanges AS (
    -- Generate date ranges between each pair of seuential dates for each poroduct_id
    SELECT *
    FROM (
        SELECT
            product_id,
            date as start_date,
            LEAD(date) OVER(PARTITION BY product_id ORDER BY date) AS end_date
        FROM CTE_ProductDates
    ) D
    WHERE end_date IS NOT NULL -- Eliminate first row from above having a null end_date
),
CTE_PreliminaryResults AS (
    SELECT
        P.product_id,
        R.start_date,
        R.end_date,
        P.gross_price,
        TOT.percentage,
        P.gross_price * (1 - 0.01 * TOT.percentage) AS net_price
    FROM Product P
    JOIN CTE_ProductDateRanges R
        ON R.product_id = P.product_id
        AND (R.start_date < P.end_date AND P.start_date < R.end_date) -- Overlap
    CROSS APPLY (
        SELECT ISNULL(SUM(D.percentage), 0) AS percentage
        FROM product_discount D
        WHERE D.product_id = R.product_id
        AND (D.start_date < R.end_date AND R.start_date < D.end_date) -- Overlap
    ) TOT
)
-- To see any of the intermediate results, uncomment any of the following,
-- while commenting out the other selects. 
--    SELECT * FROM CTE_ProductDates ORDER BY product_id, date;
--    SELECT * FROM CTE_ProductDateRanges ORDER BY product_id, start_date;
--    SELECT * FROM CTE_PreliminaryResults ORDER BY product_id, start_date;
SELECT
    product_id,
    MIN(start_date) as start_date,
    MAX(end_date) as end_date,
    gross_price,
    percentage,
    net_price
    --,COUNT(*) AS MergeCount
FROM (
    SELECT *, SUM(Change) OVER(PARTITION BY product_id ORDER BY start_date) AS grp
    FROM (
        SELECT *,
             CASE
             WHEN 
                 start_date = lag(end_date) over(partition by product_id order by start_date)
                 AND percentage = lag(percentage) over(partition by product_id order by start_date)
             THEN 0
             ELSE 1
             END AS change
        from CTE_PreliminaryResults PR
    ) PR
) PR
GROUP BY product_id, grp, gross_price, percentage, net_price
ORDER BY product_id, start_date;

Results:

product_idstart_dateend_dategross_pricepercentagenet_price
12023-01-012024-01-015104.50
22023-01-012023-07-0110109.00
22023-07-012024-01-0110010.00
32023-01-012023-07-01201018.00
32023-07-012024-01-0120519.00

Note that the transition dates between discounts for products 2 and 3 are slightly off. This seems to be a problem with the end-date interpretations as being either inclusive vs exclusive. I have assumed exclusive end-dates, which seemed consistent with the sample data having end-dates at the first of the next month, but where the discount is actually valid through the end of the last day of the previous month.

The algorithm can be changed to support inclusive end-dates by temporarily adding one day to those dates to use in the calculations. (Date range algorithms just work better this way.) The end-dates can be converted back in the end result.

However, even when I change the logic to assume inclusive end-dates, the results still do not match the OP expected results. There is no way that a discount end-date of 1 July can become an end-date of 30 June.
| product_id | start_date | end_date | gross_price | percentage | net_price |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | 2023-01-01 | 2024-01-01 | 5 | 10 | 4.50 |
| 2 | 2023-01-01 | 2023-07-01 | 10 | 10 | 9.00 |
| 2 | 2023-07-02 | 2024-01-01 | 10 | 0 | 10.00 |
| 3 | 2023-01-01 | 2023-07-01 | 20 | 10 | 18.00 |
| 3 | 2023-07-02 | 2024-01-01 | 20 | 5 | 19.00 |

See this db<>fiddle for a working demo. This includes both the above exclusive end-date solution and an alternate solution that assumes inclusive end dates. It also includes additional test data that demonstrates the final merging of adjacent ranges having the same discount.

相关问题