SQL Server How to generate a gap in sequence table where the values returned are compressed, like page selection when printing 1,2, 3-10

eoxn13cs  于 2023-03-28  发布在  其他
关注(0)|答案(1)|浏览(108)

I'm needing to query a large set of numeric Id values (i.e. Auto Counters) across an API. There can be several million values which could be 100MB+ in the payload (just returning Ids!). Rather than returning each id per row, which can result in 1M+ rows, I want to compress it by finding each gap in sequence and return the following table. Is there a better way to approach this other than what I included in example 2?

Simple Example:

StartId | EndId
1       | 100000      -- no gaps from 1 to 100000
100002  | 100002      -- gap in sequence. 100001 missing
100004  | 100004      -- gap in sequence 100003 missing
100006  | 200000      -- gap in sequence 100005 missing

Example 1: The following code gives us part of the answer and executes fast, 9M rows in under a second! Cons: Don't have starting range value. Can we capture it here?

SELECT 
    TransactionNumber + 1   
FROM    TransactionJournal mo
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    TransactionJournal  mi 
        WHERE   mi.TransactionNumber = mo.TransactionNumber + 1
        )
ORDER BY
        mo.TransactionNumber

Example 2: The following gives us the answer but is slow in execution and the first row returns null? 9M rows @ 15 seconds on a super fast PC (Thread Ripper, 128 Cores, 256GB Ram). The number of rows will grow fast. 100's of consumers of the API would be facing the 15 second delay and the cost for all the concurrent connections.

SELECT  TransactionNumber - EndId as GapSize,
    LAG(TransactionNumber) OVER (ORDER BY TransactionNumber) StartId,
        EndId
FROM    (
        SELECT  TransactionNumber,
                LAG(TransactionNumber) OVER (ORDER BY TransactionNumber) EndId              
        FROM    TransactionJournal
        ) q
WHERE   EndId <> TransactionNumber - 1
ORDER BY
        TransactionNumber

Output of the above query

PS: Some of the above code comes from other SO poster(s). I've only added to it.

olmpazwi

olmpazwi1#

I decided asked Chat GBT-4 and here is what it wrote. There seems to be an issue with my original post where it doesn't include the last range. LAG was not the correct choice.

This corrects the issue and the performance was less than 1 second.

WITH ConsecutiveGroups AS (
        SELECT
            TransactionNumber,
            TransactionNumber - DENSE_RANK() OVER (ORDER BY TransactionNumber) AS GroupID
        FROM
            TransactionsJournal
    )
    , RangeStartEnd AS (
        SELECT
            Min(TransactionNumber) AS RangeStart,
            Max(TransactionNumber) AS RangeEnd
        FROM
            ConsecutiveGroups
        GROUP BY
            GroupID
    )
    SELECT
        RangeStart,
        RangeEnd
    FROM
        RangeStartEnd
    ORDER BY
        RangeStart;

相关问题