SQL Server Should every User Table have a Clustered Index?

prdp8dxp  于 2023-08-02  发布在  其他
关注(0)|答案(6)|浏览(107)

Recently I found a couple of tables in a Database with no Clustered Indexes defined. But there are non-clustered indexes defined, so they are on HEAP.

On analysis I found that select statements were using filter on the columns defined in non-clustered indexes.

Not having a clustered index on these tables affect performance?

ajsxfq5m

ajsxfq5m1#

It's hard to state this more succinctly than SQL Server MVP Brad McGehee :
As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases–such as an identity column, or some other column where the value is increasing–and is unique. In many cases, the primary key is the ideal column for a clustered index.

BOL echoes this sentiment:

With few exceptions, every table should have a clustered index.

The reasons for doing this are many and are primarily based upon the fact that a clustered index physically orders your data in storage.

  • If your clustered index is on a single column monotonically increases, inserts occur in order on your storage device and page splits will not happen.
  • Clustered indexes are efficient for finding a specific row when the indexed value is unique, such as the common pattern of selecting a row based upon the primary key.
  • A clustered index often allows for efficient queries on columns that are often searched for ranges of values ( between , > , etc.).
  • Clustering can speed up queries where data is commonly sorted by a specific column or columns.
  • A clustered index can be rebuilt or reorganized on demand to control table fragmentation.
  • These benefits can even be applied to views .

You may not want to have a clustered index on:

  • Columns that have frequent data changes, as SQL Server must then physically re-order the data in storage.
  • Columns that are already covered by other indexes.
  • Wide keys, as the clustered index is also used in non-clustered index lookups.
  • GUID columns, which are larger than identities and also effectively random values (not likely to be sorted upon), though newsequentialid() could be used to help mitigate physical reordering during inserts.
  • A rare reason to use a heap (table without a clustered index) is if the data is always accessed through nonclustered indexes and the RID (SQL Server internal row identifier) is known to be smaller than a clustered index key.

Because of these and other considerations, such as your particular application workloads, you should carefully select your clustered indexes to get maximum benefit for your queries.

Also note that when you create a primary key on a table in SQL Server, it will by default create a unique clustered index (if it doesn't already have one). This means that if you find a table that doesn't have a clustered index, but does have a primary key (as all tables should), a developer had previously made the decision to create it that way. You may want to have a compelling reason to change that (of which there are many, as we've seen). Adding, changing or dropping the clustered index requires rewriting the entire table and any non-clustered indexes, so this can take some time on a large table.

ubbxdtey

ubbxdtey2#

I would not say "Every table should have a clustered index", I would say "Look carefully at every table and how they are accessed and try to define a clustered index on it if it makes sense". It's a plus, like a Joker, you have only one Joker per table, but you don't have to use it. Other database systems don't have this, at least in this form, BTW.

Putting clustered indices everywhere without understanding what you're doing can also kill your performance (in general, the INSERT performance because a clustered index means physical re-ordering on the disk, or at least it's a good way to understand it), for example with GUID primary keys as we see more and more.

So, read Tim Lehner's exceptions and reason.

velaa5lx

velaa5lx3#

Performance is a big hairy problem. Make sure you are optimizing for the right thing.

Free advice is always worth its price, and there is no substitute for actual experimentation.

The purpose of an index is to find matching rows and help retrieve the data when found.

A non-clustered index on your search criteria will help to find rows, but there needs to be additional operation to get at the row's data.

If there is no clustered index, SQL uses an internal rowId to point to the location of the data.

However, if there is a clustered index on the table, that rowId is replaced by the data values in the clustered index.

So the step of reading the rows data would not be needed, and would be covered by the values in the index.

Even if a clustered index isn't very good at being selective, if those keys are frequently most or all of the results requested - it may be helpful to have them as the leaf of the non-clustered index.

bakd9h0s

bakd9h0s4#

Yes you should have clustered index on a table.So that all nonclustered indexes perform in better way.

nbnkbykc

nbnkbykc5#

Consider using a clustered index when Columns that contain a large number of distinct values so to avoid the need for SQL Server to add a "uniqueifier" to duplicate key values

Disadvantage : It takes longer to update records if only when the fields in the clustering index are changed.

Avoid clustering index constructions where there is a risk that many concurrent inserts will happen on almost the same clustering index value

Searches against a nonclustered index will appear slower is the clustered index isn't build correctly, or it does not include all the columns needed to return the data back to the calling application. In the event that the non-clustered index doesn't contain all the needed data then the SQL Server will go to the clustered index to get the missing data (via a lookup) which will make the query run slower as the lookup is done row by row.

dy1byipe

dy1byipe6#

Yes, every table should have a clustered index. The clustered index sets the physical order of data in a table. You can compare this to the ordering of music at a store, by bands name and or Yellow pages ordered by a last name. Since this deals with the physical order you can have only one it can be comprised by many columns but you can only have one.

It’s best to place the clustered index on columns often searched for a range of values. Example would be a date range. Clustered indexes are also efficient for finding a specific row when the indexed value is unique. Microsoft SQL will place clustered indexes on a PRIMARY KEY constraint automatically if no clustered indexes are defined.

Clustered indexes are not a good choice for:

Columns that undergo frequent changes

  • This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile.

Wide keys

  • The key values from the clustered index are used by all nonclustered indexes as lookup keys and therefore are stored in each nonclustered index leaf entry.

相关问题