I'm needing to query a large set of numeric Id values (i.e. Auto Counters) across an API. There can be several million values which could be 100MB+ in the payload (just returning Ids!). Rather than returning each id per row, which can result in 1M+ rows, I want to compress it by finding each gap in sequence and return the following table. Is there a better way to approach this other than what I included in example 2?
Simple Example:
StartId | EndId
1 | 100000 -- no gaps from 1 to 100000
100002 | 100002 -- gap in sequence. 100001 missing
100004 | 100004 -- gap in sequence 100003 missing
100006 | 200000 -- gap in sequence 100005 missing
Example 1: The following code gives us part of the answer and executes fast, 9M rows in under a second! Cons: Don't have starting range value. Can we capture it here?
SELECT
TransactionNumber + 1
FROM TransactionJournal mo
WHERE NOT EXISTS
(
SELECT NULL
FROM TransactionJournal mi
WHERE mi.TransactionNumber = mo.TransactionNumber + 1
)
ORDER BY
mo.TransactionNumber
Example 2: The following gives us the answer but is slow in execution and the first row returns null? 9M rows @ 15 seconds on a super fast PC (Thread Ripper, 128 Cores, 256GB Ram). The number of rows will grow fast. 100's of consumers of the API would be facing the 15 second delay and the cost for all the concurrent connections.
SELECT TransactionNumber - EndId as GapSize,
LAG(TransactionNumber) OVER (ORDER BY TransactionNumber) StartId,
EndId
FROM (
SELECT TransactionNumber,
LAG(TransactionNumber) OVER (ORDER BY TransactionNumber) EndId
FROM TransactionJournal
) q
WHERE EndId <> TransactionNumber - 1
ORDER BY
TransactionNumber
PS: Some of the above code comes from other SO poster(s). I've only added to it.
1条答案
按热度按时间olmpazwi1#
I decided asked Chat GBT-4 and here is what it wrote. There seems to be an issue with my original post where it doesn't include the last range. LAG was not the correct choice.
This corrects the issue and the performance was less than 1 second.