Say I have two simple tables Users
and Posts
defined as the following:
CREATE TABLE [dbo].[Users](
[UserID] [int] IDENTITY(1,1) NOT NULL,
[Username] [varchar](255) NULL,
[FirstName] [varchar](255) NULL,
[LastName] [varchar](255) NULL,
[DateOfBirth] [datetime] NULL,
[Age] [int] NULL,
CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
[UserID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Posts](
[PostID] [int] IDENTITY(1,1) NOT NULL,
[Content] [varchar](max) NULL,
[NumberOfLikes] [int] NULL,
[UserID] [int] NULL,
[CreateDateUTC] [datetime] NULL,
[Tags] [varchar](max) NULL,
CONSTRAINT [PK_Posts] PRIMARY KEY CLUSTERED
(
[PostID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
Obviously when the database gets very large, performance tuning will become necessary. I figured that the column UserID
in Posts
is very essential because most of my queries filter by it. So I figured that I'll need to define an INDEX
on that column. As for the covering fields, let's assume, for the purpose of this question, that all of my queries look the same (except for the WHERE
part):
SELECT
Posts.Content
,Posts.NumberOfLikes
,Users.UserName
FROM
Posts
INNER JOIN
Users
ON
Posts.UserID = Users.UserID
WHERE
Posts.UserID = @UserID;
My question is about the covering fields. I can easily define an index that covers Content
and NumberOfLikes
like this:
CREATE NONCLUSTERED INDEX [IX_Posts_UserID] ON [dbo].[Posts] (UserID) INCLUDE (Content, NumberOfLikes)
However, my query always joins with the Users
table. So is my index still relevant (in terms of performance) despite that my query includes more fields (from another table) than the covering fields of the index? I know that I cannot cover fields from another table so how do I optimize the query in this case? When I looked at the execution plan, I saw that my index IX_Posts_UserID
was in fact used (50%, and another 50% by PK_Users
) but I was puzzled as to how that happened since I'm selecting columns that are not covered by the index.
So the ultimate question here is: how do table joins factor in the decision of whether indexes are used by SQL Server? Or even simpler, how do joins affect indexes?
EDIT: Per Simonare's comment, below is the execution plan:
1条答案
按热度按时间tf7tbtn21#
In general, it is advisable to, at a minimum, be adding nonclustered index on all your foreign keys since they'll likely often be used in
JOIN
operations (and occasionally inWHERE
predicates).To talk specifically to your situation here, the index you've elected to create contains a
VARCHAR(MAX)
field, which will affect how SQL Server decides to use it. Since aVARCHAR(MAX)
can theoretically grow to contain 2GB of data, the engine won't store the field data at the page level as it's limited to 8KB. In this case SQL Server decided the least costly operation was to scan your index (which isn't always a bad thing by the way, especially if selectivity is high).My recommendation here, keep the index tight and limit it to the
UserId
field to facilitate the performance of the join. I wouldn't necessarily worry about a covering index here for yourContent
column as the engine needs to dig deeper than the page-level for this data anyways.Just remember, indexes aren't magic and definitely aren't a silver bullet to all performance problems. Properly designed they can boost the efficiency of a system at a cost. Think "administrative staff in an office-place", they cost money to hire. But add a value back to the business with respect to efficiency.
As a complete aside, please do not store tags as a comma-separated list, which it seems like you're doing here.
Instead, store tags as a shared resource and link via a "join table".