SQL Server How do joins affect what indexes are used?

ifsvaxew  于 2023-11-16  发布在  其他
关注(0)|答案(1)|浏览(139)

Say I have two simple tables Users and Posts defined as the following:

CREATE TABLE [dbo].[Users](
    [UserID] [int] IDENTITY(1,1) NOT NULL,
    [Username] [varchar](255) NULL,
    [FirstName] [varchar](255) NULL,
    [LastName] [varchar](255) NULL,
    [DateOfBirth] [datetime] NULL,
    [Age] [int] NULL,
CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED 
(
    [UserID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[Posts](
    [PostID] [int] IDENTITY(1,1) NOT NULL,
    [Content] [varchar](max) NULL,
    [NumberOfLikes] [int] NULL,
    [UserID] [int] NULL,
    [CreateDateUTC] [datetime] NULL,
    [Tags] [varchar](max) NULL,
 CONSTRAINT [PK_Posts] PRIMARY KEY CLUSTERED 
(
    [PostID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

Obviously when the database gets very large, performance tuning will become necessary. I figured that the column UserID in Posts is very essential because most of my queries filter by it. So I figured that I'll need to define an INDEX on that column. As for the covering fields, let's assume, for the purpose of this question, that all of my queries look the same (except for the WHERE part):

SELECT
  Posts.Content
 ,Posts.NumberOfLikes
 ,Users.UserName
FROM
  Posts
INNER JOIN
  Users
    ON
    Posts.UserID = Users.UserID
WHERE
  Posts.UserID = @UserID;

My question is about the covering fields. I can easily define an index that covers Content and NumberOfLikes like this:

CREATE NONCLUSTERED INDEX [IX_Posts_UserID] ON [dbo].[Posts] (UserID) INCLUDE (Content, NumberOfLikes)

However, my query always joins with the Users table. So is my index still relevant (in terms of performance) despite that my query includes more fields (from another table) than the covering fields of the index? I know that I cannot cover fields from another table so how do I optimize the query in this case? When I looked at the execution plan, I saw that my index IX_Posts_UserID was in fact used (50%, and another 50% by PK_Users ) but I was puzzled as to how that happened since I'm selecting columns that are not covered by the index.

So the ultimate question here is: how do table joins factor in the decision of whether indexes are used by SQL Server? Or even simpler, how do joins affect indexes?

EDIT: Per Simonare's comment, below is the execution plan:

tf7tbtn2

tf7tbtn21#

In general, it is advisable to, at a minimum, be adding nonclustered index on all your foreign keys since they'll likely often be used in JOIN operations (and occasionally in WHERE predicates).

To talk specifically to your situation here, the index you've elected to create contains a VARCHAR(MAX) field, which will affect how SQL Server decides to use it. Since a VARCHAR(MAX) can theoretically grow to contain 2GB of data, the engine won't store the field data at the page level as it's limited to 8KB. In this case SQL Server decided the least costly operation was to scan your index (which isn't always a bad thing by the way, especially if selectivity is high).

My recommendation here, keep the index tight and limit it to the UserId field to facilitate the performance of the join. I wouldn't necessarily worry about a covering index here for your Content column as the engine needs to dig deeper than the page-level for this data anyways.

create nonclustered index ix_posts_userid on dbo.Posts (UserID);

Just remember, indexes aren't magic and definitely aren't a silver bullet to all performance problems. Properly designed they can boost the efficiency of a system at a cost. Think "administrative staff in an office-place", they cost money to hire. But add a value back to the business with respect to efficiency.

As a complete aside, please do not store tags as a comma-separated list, which it seems like you're doing here.

Instead, store tags as a shared resource and link via a "join table".

create table Tags (
    TagId int identity primary key
    ,Content nvarchar(128) not null -- or whatever width suits your needs
);

create table PostTags (
    PostId int not null
    ,TagId int not null
    ,primary key (PostId, TagId)
);

相关问题