I have two tables with the following schema:
Table A
:ColumnA, UserId,
... - rest of the schema omitted for brevityTable B
:ColumnB, UserId,
... - rest of the schema omitted for brevity
The tables can have duplicate values between them. For e.g - Table A row (<some-columnA-value>, 1, ...)
and Table B row (<some-columnB-value>, 1, ...)
, 1 being the UserId.
Now, I have an API which is used to fetch all the UserId
values from both tables. With increasing data, I want to now use pagination for this API and would like to modify the queries accordingly. There should also not be any duplicates over the pages or within a page.
How do I achieve this? Also a requirement is that I need to use keyset pagination rather than offset pagination since offset pagination gets slower as and when the offset increases.
So far, I have thought of using indexed views since there is only 1 column that I require to fetch but since the data keeps changing quite frequently and in large volumes, the overhead of maintaining the indexed view is not optimal.
Table A:
| Column A | UserId |
| ------------ | ------------ |
| x | 1 |
| y | 2 |
| z | 3 |
| w | 4 |
Table B:
Column B | UserId |
---|---|
a | 1 |
b | 3 |
c | 5 |
d | 6 |
Result (if no page size):
UserId |
---|
1 |
2 |
3 |
4 |
5 |
6 |
Result (if page size 3)
Page 1
| UserId |
| ------------ |
| 1 |
| 2 |
| 3 |
Page 2
UserId |
---|
4 |
5 |
6 |
2条答案
按热度按时间qnakjoqk1#
As I mention in my canonical post on Keyset Pagination (paging by key), the base query needs to be efficient.
The key (sic) in your case, is to use a Merge Union, group up by ID, and then take the top 3. This means that the grouping is over an already sorted set, and can use a Stream Aggregation.
So first
UNION ALL
the tables together, thenGROUP BY UserId
the final result and takeTOP (3)
.Final query plan is nice and neat. Note that the union is keeping the ordering correct for the later stages.
As I mention, you need to keep the previous highest ID value, and pass it in to the next query. Put it into both halves of the union.
db<>fiddle
np8igboo2#
If you only need to only need to retrieve pages in ascending sequence, Charlieface's answer is the Keyset Pagination solution that you want.
However, if you also have a need to support arbitrary navigation, the following extension of the Keyset Pagination technique will allow you to efficiently navigate the first, next, prior, and same page as previously accessed, and will also (at an increased cost) allow navigation to an arbitrary page.
In both cases, having an index or primary key with
UserId
as the first column is essential to an efficient execution using an index seek.The code below contains separate queries to support the following scenarios:
The last three operations would have an increased cost proportional to the number of pages skipped, but this is unavoidable without some kind of numbered key index or other.
Logic is also present to seek from the start if the requested page number is less than half the previous page number, since that would involve skipping fewer rows. Seek from start is also the fallback if any "prior" data values are unavailable (null) or invalid.
One slight difference from Charlieface's query is that the above uses a plain
UNION
(with no ALL), which de-dups the results, eliminating the need forGROUP BY
. As far as I can tell, the execution plans and results are the same.See this db<>fiddle for a demo that shows the results and execution plans for a variety of page seeks. For the purposes of the demo, the result queries were modified to also return some additional information to be displayed in the test results.
Note that if you hover over the Index Seek operations to view statistics, you can see that the "Actual Number of Rows" for each index seek is proportional to the page size * page navigation distance.
Also note that the single-page first, next, and prior navigation queries will effectively retrieve results using
... OFFSET 0 FETCH NEXT @PageSize ROWS ONLY
, which produces the same execution plan asTOP (@PageSize)
with no offset.