SQL Server: Clustered index considerably slower than equivalent non-clustered index

bmvo0sr5  于 11个月前  发布在  SQL Server
关注(0)|答案(4)|浏览(126)

The Setup

What I am about to describe is run on the following hardware:

  • Disk: 6x 2TB HDD in RAID5 (w/ 1 redundant drive)
  • CPU: Intel Xeon E5-2640 @ 2.4 GHz, 6 Cores
  • RAM: 64 GB
  • SQL Server Version: SQL Server 2016 Developer

Both SQL Server Management Studio (SSMS) and the sql server instance are running on this server. So all queries are performed locally. In addition, before any queries are performed I always run the following command to ensure no data access is cached in memory:

DBCC DROPCLEANBUFFERS

The Problem

We have a SQL Server table with roughly 11'600'000 rows. In the large scheme of things, not a particularly large table, but it will grow considerably with time.

The table has the following structure:

CREATE TABLE [Trajectory](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [FlightDate] [date] NOT NULL,
  [EntryTime] [datetime2] NOT NULL,
  [ExitTime] [datetime2] NOT NULL,
  [Geography] [geography] NOT NULL,
  [GreatArcDistance] [real] NULL,
  CONSTRAINT [PK_Trajectory] PRIMARY KEY CLUSTERED ([Id])
)

(some columns have been excluded for simplicity, but amount and size of them is very small)

While there are not that many rows, the table takes up a considerably amount of disk-space because of the [Geography] column. The contents of this column are LINESTRINGS with roughly 3000 points (that includes Z and M values).

For now, consider that we simply have a clustered index on the Id column of table, which also represents the primary key constraint, as described in the DDL above.

The problem we have is that when we query the table for a date range and a specific geography intersection it takes a considerably amount of time to complete that query.

The query we're looking at looks like this:

DEFINE @p1 = [...]

SELECT [Id], [Geography]--, (+ some other columns)
WHERE [FlightDate] BETWEEN '2018-09-04' AND '2018-09-12' AND [Geography].STIntersects(@p1) = 1

This is a fairly simply query with the two filters that I mentioned above. In order to make query fast, we've tried a few different types of indices:

1. CREATE NONCLUSTERED INDEX [IX_Trajectory_FlightDate] ON [Trajectory] ([FlightDate] ASC)

When we query the table, after having added an index like this, the expectation is that the query plan looks like so:

  1. Perform an INDEX SEEK on the index (this operation filters the 11'600'000 rows down to roughly 50'000)
  2. Make a lookup to the main table to obtain the [Geography] column plus any additionally selected columns
  3. Perform the geography filter [Geography].STIntersects(@p1) = 1 on each row returned

This is also exactly what it does. Here's a snapshot of the actual query execution plan as seen in (SSMS):

This query takes a very long time to complete (can be measured in minutes, as seen in the above screenshot).

--- UPDATE 1 START ---

Additional Query Plan Info (for main steps, NOTE: Not same execution of query as shown above, so there is a variation in times. This query took 2:39):

  • SELECT.QueryTimeStats

  • CpuTime=12241ms

  • ElapsedTime=157591ms

  • Key Lookup (97%)

  • Actual I/O Statistics

  • Actual Logical Reads=48165

  • Actual Physical Reads=81

  • Actual Time Statistics

  • Actual Elapsed CPU Time=144ms

  • Actual Elapsed Time=266ms

  • Index Seek (0%)

  • Actual I/O Statistics

  • Actual Logical Reads=85

  • Actual Physical Reads=0

  • Actual Read Aheads=73

  • Actual Scans=21

  • Filter (3%)

  • Actual Time Statistics

  • Actual Elapsed CPU Time=12156ms

  • Actual Elapsed Time=157583ms

To me this more or less all of the time of this query is spent on IO. Why I cannot explain. I will add the following of interest as well:

  • It indicates that the step that took 3% of the time/resources took 157583ms, while the step that took 97% of the time/resources took 266ms. Which I find odd.
  • If I replace the STIntersect filter with a different filter that uses the EntryTime column instead (not indexed!), which roughly returns the same number of rows, then the query time is reduced to roughly 20 seconds, despite the fact I am still selecting the same number of rows. I guess the only explanation for this is that the query does not actually need to read the expensive [Geography] column before it can discard the row.

--- UPDATE 1 END ---

2. CREATE NONCLUSTERED INDEX [IX_Trajectory_FlightDate_Includes_Geography] ON [Trajectory] ([FlightDate] ASC) INCLUDE ([Geography])

This index is only different from the other index is that it stores the large [Geography] column together with the index. But the expectations regarding the query plan is more or less the same:

When we query the table, after having added an index like this, the expectation is that the query plan looks like so:

  1. Perform an INDEX SEEK on the index (this operation filters the 11'600'000 rows down to roughly 50'000)
  2. Perform the geography filter [Geography].STIntersects(@p1) = 1 on each row returned
  3. Make a lookup to the main table to obtain the additionally selected columns

This query takes less than 10 seconds. Here's the two query plans as seen in SSMS:

Note that in the above step 2 and 3 are switched around compared to the query with the other index (meaning it perform the lookup only after having fully completed the filtering, so it only makes the lookup to the main table roughly 1'000 times, as opposed to 50'000 times). Now this indicates to me what is actually taking time when performing this query is the lookup to the main table, not so much anything else, such as the INDEX SEEK or the FILTER.

Now maintaining an index like this is not ideally something we want to do, because it uses a considerable amount of space when we consider how large the [Geography] column in the table is and how much it is going to grow going forward. REBUILDING an index like this takes several hours.

--- UPDATE 2 START ---

Additional Query Plan Info:

  • SELECT.QueryTimeStats

  • CpuTime=11648ms

  • ElapsedTime=7533ms

  • Key Lookup (88%)

  • Actual I/O Statistics

  • Actual Logical Reads=1191

  • Actual Physical Reads=0

  • Actual Time Statistics

  • Actual Elapsed CPU Time=0ms

  • Actual Elapsed Time=0ms

  • Index Seek (3%)

  • Actual I/O Statistics

  • Actual Logical Reads=7119

  • Actual Physical Reads=4

  • Actual Read Aheads=6678

  • Actual Scans=21

  • Actual Time Statistics

  • Actual Elapsed CPU Time=104ms

  • Actual Elapsed Time=168ms

  • Filter (9%)

  • Actual Time Statistics

  • Actual Elapsed CPU Time=11535ms

  • Actual Elapsed Time=6888ms

Additional notes about statistics:

  • When diving into most of these numbers, they are split really well between the available threads.
  • My guess is that the main take-away from these statistics is that this query spends exactly zero time doing IO work during the "Key Lookup", which the other query had to do a lot of. I'm not exactly sure why this is so much better at this, given that it still has to find the additionally selected column (the one I am selecting that is not the [Geography] column. But since the filter is already applied before doing the lookup, it obviously has to do it a lot less. But even so, zero IO confuses me.
  • There's very few physical reads. All the needed data (including the [Geography] column is read from the Index Seek in just 4 physical reads.

--- UPDATE 2 END ---

3. Altering the table so it is clustered on ([FlightDate] ASC, [Id] ASC)

Now, given that partitioning is something we have considered doing with the table, we've also considered changing the clustered index such that it includes the [FlightDate]. Take a look at the following SQL DDL:

ALTER TABLE [Trajectory] DROP CONSTRAINT [PK_Trajectory]
ALTER TABLE [Trajectory] ADD CONSTRAINT [PK_Trajectory] PRIMARY KEY CLUSTERED ([FlightDate] ASC, [Id] ASC)
CREATE UNIQUE INDEX [AK_Trajectory] ON [Trajectory] ([Id] ASC)

This changes the table so it is now clustered on the [FlightDate], followed by the [Id], ensuring uniqueness. In addition we add an alternative key constraint on the [Id], so it in theory can still be used to reference the table.

These 3 sql statements takes several hours to complete, but an additional bonus with this is that it makes it very easy to create partitioning on the [FlightDate] in the future, allowing for partition elimination on all queries that are made against the table.

The expectation when we now execute the same query towards the table is that the query plan looks like so:

  1. Perform an CLUSTERED INDEX SEEK on the index (this operation filters the 11'600'000 rows down to roughly 50'000)
  2. Perform the geography filter [Geography].STIntersects(@p1) = 1 on each row returned

This is an even simpler query plan than the ones described in the previous examples, and in fact it does use this plan, as can be seen here:

The only problem? It takes roughly a minute to complete. But if we look closer at the query plan itself it also disputes the previous conclusion that what actually was taking time within the query was the lookup to the main table, because here it says the primary bulk of the time is spent filtering on the [Geography] column.

I have an additional comment to this that could be of interest: Even if I do not remove the index that I created in the previous section ( [IX_Trajectory_FlightDate_Includes_Geography] ), the query will be slow after changing the table structure like this. But if I hint at the query compiler that it should use the index from the previous section with the alternative key I just created in this step [AK_Trajectory] using WITH (INDEX([AK_Trajectory], [IX_Trajectory_FlightDate_Includes_Geography]) then the query will have roughly the same performance as in (2).

So the SQL Server actually actively decides to use a query plan that is slower, obviously thinking it is faster. And frankly, I do not blame it. I'd do the same because that query plan is so much simpler. What is going on?

Now, you may be rightfully wondering whether we have considered adding a SPATIAL INDEX to the [Geography] column. This has been a consideration. The problem with such an index (and why it cannot really be used) is two-fold:

  1. The [FlightDate] index is able to filter out a considerably larger amount of [Trajectory] rows than such an index would be. The crux of the issue is that the result of such a SPATIAL INDEX "SEEK" would grow linearly as the table grows, while the result of the INDEX SEEK on the [FlightDate] will not.
  2. Maintaining such a SPATIAL INDEX is expensive and insertion operations becomes slower and slower as the index grows larger.

--- UPDATE 3 START ---

Additional Query Plan Info (for main steps, NOTE: Not same execution of query as shown above, so there is a variation in times. This query took 0:49):

  • SELECT.QueryTimeStats

  • CpuTime=11818ms

  • ElapsedTime=48253ms

  • Parallelism (7%)

  • Actual Time Statistics

  • Actual Elapsed CPU Time=7ms

  • Actual Elapsed Time=47638ms

  • Clustered Index Seek (25%)

  • Actual I/O Statistics

  • Actual Logical Reads=7403

  • Actual Physical Reads=4

  • Actual Read Aheads=6939

  • Actual Scans=21

  • Actual Time Statistics

  • Actual Elapsed CPU Time=107ms

  • Actual Elapsed Time=57ms

  • Filter (69%)

  • Actual Time Statistics

  • Actual Elapsed CPU Time=11727ms

  • Actual Elapsed Time=48250ms

Worthy of note is that:

  • Logical reads in the clustered is spread out amongst all threads.
  • Read Aheads are not spread amongst threads (but neither is it with any of the other indices).
  • Not sure how any of this explains why this would be slower than index #2.

--- UPDATE 3 END ---

--- UPDATE 4 START ---

Lucky Brain suggested that it may be slower because data is actually stored in the ROW_OVERFLOW_DATA pages rather than in IN_ROW_PAGES . Here's a closer look at how data is actually stored in the table, queried using the following query:

SELECT  
  OBJECT_SCHEMA_NAME(p.object_id) table_schema, 
  OBJECT_NAME(p.object_id) table_name, 
  p.index_id, 
  p.partition_number, 
  au.allocation_unit_id, 
  au.type_desc, 
  au.total_pages, 
  au.used_pages, 
  au.data_pages 
FROM sys.system_internals_allocation_units au 
JOIN sys.partitions p 
ON au.container_id = p.partition_id 
WHERE OBJECT_NAME(p.object_id) = 'Trajectory' 
ORDER BY table_schema, table_name, p.index_id, p.partition_number, au.type;

This gives information about how data is stored for the main table (clustered index) and each other index. The result of this is:

  • Clustered Index

  • IN_ROW_DATA: total_pages=705137, used_pages=705137, data_pages=697811

  • LOB_DATA: total_pages=10302796, used_pages=10248361, data_pages=0

  • ROW_OVERFLOW_DATA: total_pages=9, used_pages=2, data_pages=0

  • Index #2

  • IN_ROW_DATA: total_pages=497639, used_pages=494629, data_pages=496531

  • LOB_DATA: total_pages=10219824, used_pages=10217546, data_pages=0

  • ROW_OVERFLOW_DATA: ------------------------------------------------------------

As can be seen from this, is that while data is not exactly stored in ROW_OVERFLOW_DATA , it is not stored in IN_ROW_PAGES either. That being said, I don't think there's any reason to assume that retrieving data from LOB_DATA is supposed to be faster than ROW_OVERFLOW_DATA . Reading up on these types a bit, it becomes obvious that this data must be stored in LOB_DATA given that individual columns generally exceed the 8kB max for ROW_OVERFLOW_DATA .

But as can also be seen from the above, both the main table (clustered index) and index #2 uses LOB_DATA pages, so I am not entirely sure why the index #2 would be so much faster, unless LOB_DATA means something different to an index, when compared to a clustered index.

But I feel like everything I have seen bolsters the same conclusion:

  1. When the query has to make a lookup to the main table that includes data in LOB_DATA , that lookup is invariably very slow (even if it is performed as part of an INDEX SEEK on the clustered index). Basically every query that I've made (fast or slow) suggests this. For example, consider the index #1:

  2. With the original query, it will have to make the Key Lookup ~50'000 times, taking close to 3 minutes to complete.

  3. If I change the query so it filters on the [EntryTime] (example already explained in UPDATE 1 ) in such a way the result set remains roughly equal (about ~1'000 rows), then the query suddenly takes about 20 seconds. Such a change means it only has to make the lookup to the LOB_DATA pages in the main table for the actual result set, as opposed to all 50'000 entries that were seeked out in index #1. (The important note here being that it still has to make a Key Lookup to the main table for all, it just does not need to go to the LOB_DATA for each entry.)

  4. But while 20 seconds is a lot faster than 3 minutes, it still falls short of the original query when performed with index #2 (which has to touch all 50'000 [Geometry] values!). And right now I feel like the only logical explanation for this is that the lookup to the main table's LOB_DATA somehow slows down the query significantly.

  5. I would say that this can explain the fairly significant difference in performance between index #1 and index #2. Less so, the difference between index #2 and the clustered index #3.

--- UPDATE 4 END ---

--- UPDATE 5 START ---

The previous update included physical page stats for two of the indices, here's the same stats for the first index:

  • Index #1

  • IN_ROW_DATA: total_pages=18705, used_pages=18698, data_pages=18659

  • LOB_DATA: ------------------------------------------------------------

  • ROW_OVERFLOW_DATA: ------------------------------------------------------------

Obviously, this does not include LOB_DATA or ROW_OVERFLOW_DATA . But more suprisingly is the fact that the IN_ROW_DATA uses significantly (to the order of about 20-30) less pages than index #2. Which would indicate, as suggested by Lucky Brain, that when a spatial column is included in an index, SQL Server may store some information about that geometry/geography, such as a bounding box, directly in the IN_ROW_DATA in order to quickly perform geometry operations.

This of course, assumes that the table does not do this for "normal" spatial columns, when part of a clustered index.

--- UPDATE 5 END ---

The Questions

Can anyone answer these two questions:

  • Can it be true that the simple lookup operation can explain the difference in performance between the indices described in (1) and (2).
  • Why is the clustered index described in (3) considerable slower than the index described in (2)?
  • If neither of the two above can be answered, should we see such a big performance deficit when comparing such two indicies as described in question 1 and 2 or is it more likely something else is wrong with our setup?
lqfhib0f

lqfhib0f1#

Looking at your query the first thing to consider is that you include in the SELECT list a spacial column which is a .NET/CLR data type and these are stored outside the IN_ROW_DATA pages requiring key look-up unless the spacial column is included in the index which potentially also includes the spacial bounding-box in the index data pages to speed up the filtering saving most of the disk I/O. I would say you uncovered an efficient trick to speed up spacial columns filtering without the need of an spacial index.

To prove my point I refer you to the original SQL documentation, which I'm sure you already know, about covering indexes where it clarifies the following: "Non-key columns added to the leaf level of a nonclustered index to improve query performance. This allows the query optimizer to locate all the required information from an index scan; the table or clustered index data is not accessed.". The last part is very important here, so I assume the bounding box is part of the "required information" of a spacial column to help the query optimizer avoid accessing the IN_ROW_DATA .

Conclusion:

  1. Can it be true that the simple lookup operation can explain the difference in performance between the indices described in (1) and (2)? I would say so because of the spacial CLR data type being stored outside the IN_ROW_DATA pages requiring much more disk I/O in (1).
  2. Why is the clustered index described in (3) considerable slower than the index described in (2)? Same reason, including the Geography data in the index (2) saves the need to look that up outside the IN_ROW_DATA pages saving most of the disk I/O; bear in mind that index (3) still needs to look-up the spacial data in the LOB_DATA.
  3. If neither of the two above can be answered, should we see such a big performance deficit when comparing such two indicies as described in question 1 and 2 or is it more likely something else is wrong with our setup? N/A.
vbkedwbf

vbkedwbf2#

Perhaps you can add a column for the bounding box to do a light weight check before doing the full test of the point in the polygon. If it turns out you can eliminate most records using the bounding box, then it will save time.

DECLARE @g geography,
    @b_box geography,
    @pIn  geography,
    @pOut geography

SET @g = geography::STPolyFromText('POLYGON((-91.119987 40.705402, -91.129158 40.682148, -91.162498 40.656311, -91.214912 40.643818, -91.262062 40.639545, -91.375610 40.603439, -91.411118 40.572971, -91.412872 40.547993, -91.382103 40.528496, -91.374794 40.503654, -91.385399 40.447250, -91.372757 40.402988, -91.385757 40.392361, -91.418816 40.386875, -91.448593 40.371902, -91.476883 40.390968, -91.490158 40.390762, -91.500221 40.405117, -91.527534 40.410126, -91.529449 40.435043, -91.538689 40.441246, -91.533051 40.455399, -91.579224 40.463718, -91.585869 40.484478, -91.616699 40.504833, -91.622375 40.532864, -91.691917 40.551640, -91.689796 40.581165, -91.716812 40.593399, -91.741547 40.609749, -91.946198 40.608234, -92.192993 40.600060, -92.361328 40.599548, -92.646240 40.591438, -92.717621 40.589645, -93.100731 40.584335, -93.370056 40.580482, -93.562691 40.580807, -93.786079 40.578445, -94.017830 40.574024, -94.238159 40.570972, -94.484993 40.574215, -94.639633 40.575756, -94.920364 40.577229, -95.217171 40.581898, -95.382294 40.584335, -95.767204 40.589046, -95.757271 40.620903, -95.767723 40.643116, -95.876335 40.730434, -95.851509 40.792599, -95.846153 40.848331, -95.834114 40.870300, -95.836258 40.901108, -95.837318 40.974258, -95.860611 41.002651, -95.859253 41.035004, -95.878517 41.065872, -95.857986 41.109188, -95.876396 41.164204, -95.859512 41.166866, -95.858910 41.180538, -95.915810 41.194065, -95.921959 41.207855, -95.910690 41.225246, -95.929939 41.302059, -95.910912 41.308472, -95.897301 41.286865, -95.888817 41.301392, -95.942604 41.340080, -95.939766 41.394810, -95.934776 41.462387, -95.952896 41.472393, -96.006607 41.481960, -96.013161 41.493000, -95.996399 41.511524, -95.993675 41.528111, -96.004303 41.536671, -96.049881 41.524342, -96.085548 41.537529, -96.091644 41.563152, -96.080544 41.576008, -96.111015 41.599014, -96.099014 41.654690, -96.119972 41.684105, -96.121910 41.694923, -96.085266 41.704998, -96.099480 41.731575, -96.099030 41.752987, -96.076126 41.791481, -96.135330 41.862633, -96.159676 41.904163, -96.145576 41.924919, -96.147034 41.966267, -96.184921 41.980698, -96.202545 41.996628, -96.235794 42.001270, -96.238426 42.028450, -96.265182 42.048908, -96.284821 42.123463, -96.351860 42.168194, -96.363205 42.214050, -96.337402 42.229530, -96.332352 42.260315, -96.342575 42.282089, -96.368393 42.298031, -96.389473 42.328796, -96.423866 42.349285, -96.411453 42.380924, -96.417320 42.414783, -96.397583 42.441799, -96.395767 42.467407, -96.439087 42.489246, -96.479935 42.517136, -96.489029 42.564034, -96.500633 42.573891, -96.488190 42.580486, -96.512535 42.629761, -96.540855 42.662411, -96.562729 42.668518, -96.626228 42.708359, -96.640396 42.748608, -96.632668 42.776840, -96.600563 42.799564, -96.587334 42.835388, -96.572815 42.834354, -96.555901 42.846668, -96.537201 42.896915, -96.543953 42.913876, -96.514626 42.952393, -96.516838 42.986469, -96.498711 43.012062, -96.519699 43.051521, -96.479263 43.061897, -96.461784 43.075596, -96.460495 43.087887, -96.451195 43.126324, -96.472801 43.209099, -96.486931 43.217926, -96.558289 43.225506, -96.566673 43.239651, -96.559250 43.253281, -96.570404 43.263630, -96.578812 43.290092, -96.540245 43.307678, -96.522575 43.356987, -96.524734 43.384247, -96.557388 43.400749, -96.588791 43.435562, -96.583473 43.481945, -96.597992 43.499874, -96.460136 43.499744, -96.060738 43.498566, -95.866615 43.498978, -95.464493 43.499577, -95.396278 43.500370, -94.920197 43.499413, -94.859573 43.500072, -94.454987 43.498146, -94.246544 43.498993, -93.973717 43.500343, -93.653481 43.500809, -93.500618 43.500534, -93.054184 43.501495, -93.027016 43.501316, -92.557831 43.500294, -92.452995 43.499496, -92.077370 43.499187, -91.730217 43.499603, -91.610954 43.500656, -91.223434 43.500835, -91.235771 43.464710, -91.210785 43.424076, -91.198112 43.370537, -91.176918 43.353970, -91.078369 43.313320, -91.066299 43.280704, -91.068924 43.257919, -91.161224 43.147594, -91.168442 43.082905, -91.159622 43.081200, -91.152084 43.001331, -91.138992 42.925907, -91.093300 42.871452, -91.081902 42.783375, -91.066040 42.744923, -90.999054 42.707066, -90.919281 42.680683, -90.892418 42.678246, -90.745483 42.657005, -90.694664 42.637932, -90.664253 42.571392, -90.639091 42.555714, -90.625580 42.528561, -90.638329 42.509361, -90.651772 42.494698, -90.648346 42.475643, -90.605827 42.460560, -90.563583 42.421837, -90.491043 42.388783, -90.441597 42.360073, -90.427681 42.340633, -90.417984 42.263924, -90.407173 42.242645, -90.367729 42.210209, -90.323601 42.197319, -90.230934 42.159721, -90.191574 42.122688, -90.176086 42.120502, -90.166649 42.103745, -90.168098 42.061043, -90.150536 42.033428, -90.142670 41.983963, -90.154518 41.930775, -90.195839 41.806137, -90.255310 41.781738, -90.304886 41.756466, -90.326027 41.722736, -90.341133 41.649090, -90.339348 41.602798, -90.348366 41.586849, -90.423004 41.567272, -90.434967 41.543579, -90.454994 41.527546, -90.540840 41.525970, -90.600700 41.509586, -90.658791 41.462318, -90.708214 41.450062, -90.779900 41.449821, -90.844139 41.444622, -90.949654 41.421234, -91.000694 41.431084, -91.027489 41.423508, -91.055786 41.401379, -91.073280 41.334896, -91.102348 41.267818, -91.101524 41.231522, -91.056320 41.176258, -91.018257 41.165825, -90.990341 41.144371, -90.957787 41.104359, -90.954651 41.070362, -90.960709 40.950504, -90.983276 40.923927, -91.049210 40.879585, -91.088905 40.833729, -91.092751 40.761547, -91.119987 40.705402))', 4326);

SET @g = @g.ReorientObject()

DECLARE @g_flat_box geometry = geometry::STGeomFromWKB(@g.STAsBinary(), @g.STSrid).STEnvelope();
DECLARE @g_rnd_box geography = geography::STGeomFromWKB(@g_flat_box.STAsBinary(), @g_flat_box.STSrid);

SELECT @g as [poly], @g_rnd_box as [box]

SET @pIn = geography::STPointFromText('POINT( -90.6204165 41.5795478)',4326)
SET @pOut = geography::STPointFromText('POINT( -80.6204165 31.5795478)',4326)

SELECT STIntersectionIn    = @g_rnd_box.STIntersection( @pIn ).ToString(),
       STIntersectionOut   = @g_rnd_box.STIntersection( @pOut ).ToString(),
       STIntersectionIn    = @g.STIntersection( @pIn ).ToString(),
       STIntersectionOut   = @g.STIntersection( @pOut ).ToString()
GO

I got the data for this example from https://www.sqlservercentral.com/forums/topic/sql-geography-data-type-test-if-point-is-inside-a-closed-polygon

ruarlubt

ruarlubt3#

It's all about disk hits.

Without a Geometry index, 50K Geometries need to be fetched and tested. This is 50K disk hits. With HDD, a simple Rule of Thumb is 100/second. Hence 500 seconds. See your #3: "Perform the geography filter [Geography].STIntersects(@p1) = 1 on each row returned".

To speed it up, you must use a Spatial index and hope that it will do better than the date filter.

Your #2 says "INCLUDE GEOMETRY". That would seem to make the index so bulky as to not really help. As you noted "more or less the same".

A Clustered index on (FlightDate, id), in that order would cut back on the disk hits ifFlightDate is the optimal starting point. (I do not know whether a Geometry index will be better.)

So, have two indexes: Clustered (FlightDate, id) and Geometry (geometry). And hope that the Optimizer will pick whichever is better.

k2arahey

k2arahey4#

Just my 2 cents; you could try partitioning the large table into, for examle, a table per year. Or even a table per month - whatever is the best fit for your use case. Speeds up everything, you could even parallellize multi-table spanning queries in your DAL layer.

相关问题