SQL Server Select first record in group where no overlap

jmo0nnb3  于 2023-02-28  发布在  其他
关注(0)|答案(1)|浏览(127)

I have a set of rows which looks like this, ordered by First_Date

Id         First_Date   Second_Date
ABC12345   2022-05-18   2022-08-16
ABC12345   2022-05-23   2022-08-29
ABC12345   2022-06-06   2022-11-23
ABC12345   2022-06-16   2022-11-23
ABC12345   2022-08-16   2022-11-23
ABC12345   2022-11-01   2022-11-29

After always keeping the first rows, I want to include all following rows where the first_date does not fall between (exclusive) the two dates in this first record.

Once there is a row which is not excluded (example above would be fifth record), then it would check the next row against this until the end.

So I only want to keep dates where there is no overlap at all - same dates are OK, just not between.

So I want the final result to look like this

Id         First_Date   Second_Date   Flag
ABC12345   2022-05-18   2022-08-16    Keep
ABC12345   2022-05-23   2022-08-29    Exclude
ABC12345   2022-06-06   2022-11-23    Exclude
ABC12345   2022-06-16   2022-11-23    Exclude
ABC12345   2022-08-16   2022-11-23    Keep
ABC12345   2022-11-01   2022-11-29    Exclude

Is there any way of doing this. I've tried LAG function, but can't get it to work.

pqwbnv8z

pqwbnv8z1#

I'll assume that if multiple Id values are present, each should be handled separately.

The following will use several Common Table Expressions (CTEs) to (1) Define a well-defined row ordering, (2) Define a relationship between each row and the first eligible "following" row, and finally (3) Chain the linked rows together using a recursive CTE that starts with the first row for each Id (RowNum = 1) and follows the chain of linked rows.

Following that, you can either select the chained rows directly to get the filtered data, or join them back to the original data to generate the Keep/Exclude flag as you posted in your requested results.

;WITH OrderedData AS (
    SELECT
        D.*,
        ROW_NUMBER() OVER(
            PARTITION BY D.Id
            ORDER BY D.First_Date, D.Second_Date DESC
            ) AS RowNum
    FROM Data D
),
LinkedData AS (
    SELECT D.*, D2.RowNum AS NextRowNum
    FROM OrderedData D
    OUTER APPLY(
        SELECT TOP 1 D2.*
        FROM OrderedData D2
        WHERE D2.Id = D.Id
        AND D2.RowNum > D.RowNum
        AND D2.First_Date >= D.Second_Date
        ORDER BY D2.RowNum
    ) D2
), ChainedData AS (
    SELECT LD.*
    FROM LinkedData LD
    WHERE LD.RowNum = 1
    UNION ALL
    SELECT LD.*
    FROM ChainedData CD
    JOIN LinkedData LD
        ON LD.Id = CD.Id
        AND LD.RowNum = CD.NextRowNum
)
SELECT
    D.*,
    CASE WHEN CD.RowNum IS NOT NULL THEN 'Keep' ELSE 'Exclude' END AS Flag
FROM OrderedData D
LEFT JOIN ChainedData CD
        ON CD.Id = D.Id
        AND CD.RowNum = D.RowNum
ORDER BY D.Id, D.RowNum

The OrderedData step might be skipped if you have no duplicate First_Date for a given Id. In that case, you could replace most occurrences or RowNum with First_Date. If there can be duplicate First_Date values, the next question is which record to chose - the one with the earliest Second_Date or the latest?

For efficiency you should probably place an index on Data(Id, First_Date) at a minimum.

See this db<>fiddle for a working demo with additional test data.

Sample results:
| Id | First_Date | Second_Date | RowNum | Flag |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| ABC12345 | 2022-05-18 | 2022-08-16 | 1 | Keep |
| ABC12345 | 2022-05-23 | 2022-08-29 | 2 | Exclude |
| ABC12345 | 2022-06-06 | 2022-11-23 | 3 | Exclude |
| ABC12345 | 2022-06-16 | 2022-11-23 | 4 | Exclude |
| ABC12345 | 2022-08-16 | 2022-11-23 | 5 | Keep |
| ABC12345 | 2022-11-01 | 2022-11-29 | 6 | Exclude |
| DEF00001 | 2022-01-01 | 2022-01-02 | 1 | Keep |
| DEF00001 | 2022-01-02 | 2022-01-05 | 2 | Keep |
| DEF00001 | 2022-01-03 | 2022-01-06 | 3 | Exclude |
| DEF00001 | 2022-01-04 | 2022-01-07 | 4 | Exclude |
| DEF00001 | 2022-01-05 | 2022-01-08 | 5 | Keep |
| DEF00001 | 2022-01-10 | 2022-01-20 | 6 | Keep |
| DEF00001 | 2022-01-15 | 2022-01-25 | 7 | Exclude |
| DEF00001 | 2022-01-22 | 2022-01-31 | 8 | Keep |

相关问题