I have two tables with the following schema:

Table A : ColumnA, UserId, ... - rest of the schema omitted for brevity
Table B : ColumnB, UserId, ... - rest of the schema omitted for brevity

The tables can have duplicate values between them. For e.g - Table A row (<some-columnA-value>, 1, ...) and Table B row (<some-columnB-value>, 1, ...) , 1 being the UserId.

Now, I have an API which is used to fetch all the UserId values from both tables. With increasing data, I want to now use pagination for this API and would like to modify the queries accordingly. There should also not be any duplicates over the pages or within a page.

How do I achieve this? Also a requirement is that I need to use keyset pagination rather than offset pagination since offset pagination gets slower as and when the offset increases.

So far, I have thought of using indexed views since there is only 1 column that I require to fetch but since the data keeps changing quite frequently and in large volumes, the overhead of maintaining the indexed view is not optimal.

Table A:
| Column A | UserId |
| ------------ | ------------ |
| x | 1 |
| y | 2 |
| z | 3 |
| w | 4 |

Table B:

Column B	UserId
a	1
b	3
c	5
d	6

Result (if no page size):

UserId
1
2
3
4
5
6

Result (if page size 3)

Page 1
| UserId |
| ------------ |
| 1 |
| 2 |
| 3 |

Page 2

UserId
4
5
6

If you only need to only need to retrieve pages in ascending sequence, Charlieface's answer is the Keyset Pagination solution that you want.

However, if you also have a need to support arbitrary navigation, the following extension of the Keyset Pagination technique will allow you to efficiently navigate the first, next, prior, and same page as previously accessed, and will also (at an increased cost) allow navigation to an arbitrary page.

In both cases, having an index or primary key with UserId as the first column is essential to an efficient execution using an index seek.

The code below contains separate queries to support the following scenarios:

First page - Selects the first @PageSize rows.
Next page - Selects the next @PageSize rows above the last maximum key value.
Prior page - Selects the previous @PageSize rows before the last minimum key.
Same page - Selects @PageSize rows at or above the last minimum key value.
Skip ahead N pages - Scans the next N * @PageSize rows above the last maximum key value, selecting only the last @PageSize rows.
Skip back N pages - Scans the prior N * @PageSize rows before the last minimum key value, selecting only the last (lowest) @PageSize rows.
Initial retrieval of page N - Scans the first N * @PageSize rows, selecting only the last @PageSize rows.

The last three operations would have an increased cost proportional to the number of pages skipped, but this is unavoidable without some kind of numbered key index or other.

Logic is also present to seek from the start if the requested page number is less than half the previous page number, since that would involve skipping fewer rows. Seek from start is also the fallback if any "prior" data values are unavailable (null) or invalid.

-- Procedure to retrieve paged UserIds using modified Keyset Pagination
-- Optimally supports first, next, prior, and same page retrieval.
-- Arbitrary page navigation minimizes discarded row retrieval.
-- Efficient execution depends on key being the first indexed column of some index.
CREATE PROCEDURE GetPagedUserIdsExtended
    @PageSize INT,
    @PageNumber INT,
    @PreviousPageNumber INT = NULL,
    @PreviousMinUserId INT = NULL,
    @PreviousMaxUserId INT = NULL
AS

DECLARE @Offset INT
  
IF (@PageNumber IS NULL
    OR @PageNumber <= 0
    OR @PageSize IS NULL
    OR @PageSize <= 0
)
BEGIN
    -- Invalid - Return empty result
    SELECT 0 AS UserId
END
  
ELSE IF (@PageNumber = 1
         OR @PreviousPageNumber IS NULL
         OR @PreviousPageNumber <= 0
         OR (@PreviousMinUserId IS NULL AND @PageNumber < @PreviousPageNumber)
         OR (@PreviousMaxUserId IS NULL AND @PageNumber >= @PreviousPageNumber)
         OR @PageNumber < @PreviousPageNumber / 2 -- Quicker to seek from start
)
BEGIN
    -- Seek relative to start
    SET @Offset = @PageSize * (@PageNumber - 1)

    SELECT U.UserId
    FROM (
        SELECT A.UserId
        FROM TableA A
        UNION ALL
        SELECT B.UserId
        FROM TableB B
    ) U
    GROUP BY U.UserId
    ORDER BY U.UserId
    OFFSET @Offset ROWS
    FETCH NEXT @PageSize ROWS ONLY
END

ELSE IF (@PageNumber = @PreviousPageNumber)
BEGIN
    -- Same page seek forward
    SET @Offset = 0

    SELECT U.UserId
    FROM (
        SELECT A.UserId
        FROM TableA A
        WHERE A.UserId >= @PreviousMinUserId
        UNION ALL
        SELECT B.UserId
        FROM TableB B
        WHERE B.UserId >= @PreviousMinUserId
    ) U
    GROUP BY U.UserId
    ORDER BY U.UserId
    OFFSET @Offset ROWS
    FETCH NEXT @PageSize ROWS ONLY
END

ELSE IF (@PageNumber > @PreviousPageNumber)
BEGIN
    -- Seek forward 1 or more pages
    SET @Offset = @PageSize * (@PageNumber - @PreviousPageNumber - 1)

    SELECT U.UserId
    FROM (
        SELECT A.UserId
        FROM TableA A
        WHERE A.UserId > @PreviousMaxUserId
        UNION ALL
        SELECT B.UserId
        FROM TableB B
        WHERE B.UserId > @PreviousMaxUserId
    ) U
    GROUP BY U.UserId
    ORDER BY U.UserId
    OFFSET @Offset ROWS
    FETCH NEXT @PageSize ROWS ONLY
END

ELSE IF (@PageNumber < @PreviousPageNumber)
BEGIN
    -- Seek backwards 1 or more pages
    SET @Offset = @PageSize * (@PreviousPageNumber - @PageNumber - 1)

    SELECT U.UserId
    FROM (
        SELECT U.UserId
        FROM (
            SELECT A.UserId
            FROM TableA A
            WHERE A.UserId < @PreviousMinUserId
            UNION ALL
            SELECT B.UserId
            FROM TableB B
            WHERE B.UserId < @PreviousMinUserId
        ) U
        GROUP BY U.UserId
        ORDER BY U.UserId DESC
        OFFSET @Offset ROWS
        FETCH NEXT @PageSize ROWS ONLY
    ) U
    ORDER BY U.UserId -- Reorder ascending
END

-- End of procedure

One slight difference from Charlieface's query is that the above uses a plain UNION (with no ALL), which de-dups the results, eliminating the need for GROUP BY . As far as I can tell, the execution plans and results are the same.

See this db<>fiddle for a demo that shows the results and execution plans for a variety of page seeks. For the purposes of the demo, the result queries were modified to also return some additional information to be displayed in the test results.

Note that if you hover over the Index Seek operations to view statistics, you can see that the "Actual Number of Rows" for each index seek is proportional to the page size * page navigation distance.

Also note that the single-page first, next, and prior navigation queries will effectively retrieve results using ... OFFSET 0 FETCH NEXT @PageSize ROWS ONLY , which produces the same execution plan as TOP (@PageSize) with no offset.

2条答案

按热度按时间

qnakjoqk1#

As I mention in my canonical post on Keyset Pagination (paging by key), the base query needs to be efficient.

The key (sic) in your case, is to use a Merge Union, group up by ID, and then take the top 3. This means that the grouping is over an already sorted set, and can use a Stream Aggregation.

So first UNION ALL the tables together, then GROUP BY UserId the final result and take TOP (3) .

SELECT TOP (3)
  t.UserId
FROM (
    SELECT a.UserId
    FROM TableA a
    UNION ALL
    SELECT b.UserId
    FROM TableB b
) t
GROUP BY t.UserId
ORDER BY UserId;

Final query plan is nice and neat. Note that the union is keeping the ordering correct for the later stages.

As I mention, you need to keep the previous highest ID value, and pass it in to the next query. Put it into both halves of the union.

SELECT TOP (3)
  t.UserId
FROM (
    SELECT a.UserId
    FROM TableA a
    WHERE a.UserId > @previousId
    UNION ALL
    SELECT b.UserId
    FROM TableB b
    WHERE b.UserId > @previousId
) t
GROUP BY t.UserId
ORDER BY UserId;

db<>fiddle

赞(0）回复(0）举报 12个月前

np8igboo2#

Pagination from two tables in SQL Server

2条答案

相关问题

热门标签

最新问答