SQL Server Create membership timeseries from static table

aurhwmvo  于 2023-02-18  发布在  其他
关注(0)|答案(1)|浏览(98)

I have a table (members) which shows users and any additions or deletions to a specific membership and on what date as below
| User | Membership | Change | Date |
| ------------ | ------------ | ------------ | ------------ |
| 1 | 100 | A | 01/01/1900 |
| 1 | 101 | A | 01/01/1990 |
| 1 | 100 | D | 01/01/2000 |
| 2 | 100 | A | 01/12/1990 |
| 2 | 101 | A | 01/01/1991 |
| 2 | 101 | D | 01/12/1991 |
| 2 | 100 | D | 01/01/1993 |
| 3 | 100 | A | 01/01/2000 |

I'm looking to find cases where one user is in multiple memberships in overlapping time periods, I'm using the below but it's pulling the wrong end dates when I run it

With membership As

    (Select user, membership, date as Start_Date, 
    LEAD (date, 1, '31/12/9999') OVER (PARTITION BY membership ORDER BY 
    date) AS End_Date

    FROM
        (Select *, LAG(change,1,-1) OVER(PARTITION BY membership ORDER BY 
        date) AS Previous_change
        From members) withprevious
        Where change != previous_change),
    MemberTimeSeries AS
    (Select *
    From membership
    Where Start_Date IN
        (Select a.Date
        From members a
        join membership b
        on a.user = b.user
        and a.membership = b.membership
        Where a.change = 'A')),
    DupeIDs AS
    (Select Distinct a.user, a.membership, cast(a.start_date as date_ as 
    start_date, cast(a.end_date as date) as end_date
        from membertimeseries a
        join membertimeseries b
        on a.user = b.user
        and ((a.start_date >= b.start_date abd a.start_date < b.end_date)
        Or (a.end_date > b.start_date and a.end_date <= b.end_date)
        OR (b.start_date >= a.start_date abd b.start_date < a.end_date)
        Or (b.end_date > a.start_date and b.end_date <= a.end_date)

I'm looking to see any user and membership combinations with respective start and end dates where it overlaps with another membership combination for the same user during any period of it's active membership. If there is no deletion in the table I want date to default to 31/12/9999

For example I want to see the below from my example
| User | Membership | Start_Date | End_Date |
| ------------ | ------------ | ------------ | ------------ |
| 1 | 100 | 01/01/1900 | 01/01/2000 |
| 1 | 101 | 01/01/1990 | 31/12/9999 |
| 2 | 100 | 01/12/1990 | 01/01/1993 |
| 2 | 101 | 01/01/1991 | 01/12/1991 |

oyxsuwqo

oyxsuwqo1#

After reviewing at your updated code, I decided it might be better to take a different approach.

You can combine the add and drop rows into ranges by first selecting the adds and then using an OUTER APPLY to look up the first following drop row with the same user and membership (if one exists).

If the above is wrapped up in a Common Table Expression (CTE), overlaps can then be identified by joining the ranges with themselves and limiting the join to those with the same user, overlapping date ranges, and excluding same row joins.

A standard test for overlapping date ranges is Start1 < End2 AND Start2 < End1 . For exclusive end-dates (as appears to be the case here) < is used. If the ranges were defined with inclusive end dates <= would be used. Omitted end dates can be handled with an end-of-time default ISNULL(EndDate, '9999-12-31') '

To omit self matches, the following uses ROW_NUMBER() to assign distinct row IDs. You could alternately check for "not (all values equal)".

;WITH Ranges AS (
    SELECT
        A.[User], A.Membership, A.Date AS StartDate, D.Date AS EndDate,
        ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowId
    FROM Memberships A
    OUTER APPLY (
        SELECT TOP 1 D.*
        FROM Memberships D
        WHERE D.Change = 'D'
        AND D.[User] = A.[User]
        AND D.Membership = A.Membership
        AND D.Date > A.Date
        ORDER BY D.Date
    ) D
    WHERE A.Change = 'A'
)
SELECT DISTINCT R.[User], R.Membership, R.StartDate, R.EndDate
FROM Ranges R
JOIN Ranges R2
    ON R2.[User] = R.[User]
    AND R2.RowId <> R.RowId
    AND R2.StartDate < ISNULL(R.EndDate, '2099-12-31')
    AND R.StartDate < ISNULL(R2.EndDate, '2099-12-31')
ORDER BY R.[User], R.StartDate

In case of multiple overlaps, DISTINCT eliminates dups. An alternative would be to replace the JOIN with a WHERE EXISTS(...) .

Results:
| User | Membership | StartDate | EndDate |
| ------------ | ------------ | ------------ | ------------ |
| 1 | 100 | 1900-01-01 | 2000-01-01 |
| 1 | 101 | 1990-01-01 | null |
| 2 | 100 | 1990-12-01 | 1993-01-01 |
| 2 | 101 | 1991-01-01 | 1991-12-01 |
| 5 | 100 | 2020-01-01 | 2020-02-01 |
| 5 | 102 | 2020-01-15 | 2020-02-15 |
| 5 | 101 | 2020-02-01 | 2020-03-01 |

See this db<>fiddle .

相关问题