How to avoid union in SQL Server for this query?

yc0p9oo0  于 2023-08-02  发布在  SQL Server
关注(0)|答案(6)|浏览(118)

I have a union statement in the query which needs to eliminated as its performance hit because of the millions of records found in below tables.

How can I achieve using left join so that performance is not compromised?

The difference between 2 select statements here is that when M.Id<>0 then U.UserId is used in 1st SELECT statement and when M.Id=0 then '' is returned in 2nd SELECT statement as I'm not using User table in it . I'm using SQL Server 2016

Select 
    U.UserId, A.ActivityPlace 
From 
    UserTable U
Inner Join 
    MasterTable M ON M.Id = U.UserId
Inner Join 
    ActivityTable A ON A.ActivityID = M.UserId
Where 
    M.Id <> 0

Union

Select 
    '', A.ActivityPlace 
From 
    MasterTable M 
Inner Join 
    ActivityTable A ON A.ActivityID = M.UserId
Where 
    M.Id = 0
of1yzvn4

of1yzvn41#

You can try to move condition into the CASE expression.

Select 
    CASE WHEN M.Id<>0 THEN U.UserId ELSE 0 END AS UserId
    ,A.ActivityPlace 
From 
    MasterTable AS M
    INNER JOIN ActivityTable AS A ON A.ActivityID = M.UserId
    LEFT JOIN UserTable AS U ON M.Id = U.UserId
;

Since you didn't show how the tables are related, it may or may not produce correct result. It is hard to tell without knowing how the tables are related.

pbossiut

pbossiut2#

In that first query you get userid, in secon query you won't get userid. Hence this 2 set of result is unique. so you can use union all.

Select U.UserId, A.ActivityPlace From UserTable U
Inner Join MasterTable  M ON M.Id=U.UserId
Inner Join ActivityTable A ON A.ActivityID=M.UserId
Where M.Id<>0

Union all

Select '', A.ActivityPlace From MasterTable  
M Inner Join ActivityTable A ON A.ActivityID=M.UserId
Where M.Id=0
vshtjzan

vshtjzan3#

I think you can do below

Select 
U.UserId, A.ActivityPlace 
From 
UserTable U
Inner Join 
MasterTable M ON (M.Id = U.UserId or M.Id = 0)
Inner Join 
ActivityTable A ON A.ActivityID = M.UserId
j2datikz

j2datikz4#

I think I would recommend:

Select U.UserId, A.ActivityPlace 
From MasterTable M left join
     UserTable U  
     on M.Id = U.UserId and m.id <> 0 left join
     ActivityTable A 
     on A.ActivityID = M.UserId;

This is not 100% equivalent to your query. But with some assumptions:

  • The UNION is not actually eliminating duplicates.
  • NULL is a reasonable replacement for '' .
  • All but 0 user ids match

Then it should do what you want. Actually, for the third condition, you could add:

where m.id = 0 or u.userid is not null

Actually, I assume that the special treatment for m.id = 0 is simply because there is no user. In that case, you just want left join s:

Select m.Id as UserId, A.ActivityPlace 
From MasterTable M left join
     UserTable U  
     on M.Id = U.UserId left join
     ActivityTable A 
     on A.ActivityID = M.UserId;
wydwbb8l

wydwbb8l5#

I have a union statement in the query which needs to eliminated as its performance hit because of the millions of records found in below tables.

How can I achieve using left join so that performance is not compromised?

Unfortunately you did not provide any useful information, beside the fact that you have millions of records in some tables.

Here is something that would have helped us reproduce your problem and work out a possible solution:

  • posting the full structure of the tables including indexes
  • showing some sample data
  • posting an example showing expected results

The obvious answer is: check the execution plan and if you don't understand it, post it along with the information mentioned above.

I am afraid that the answers you've got are speculative. In spite of the best efforts of participants there is no guarantee that any of these answers will address your performance problem adequately. There is simply not enough information.

The replies you've got are quite telling:

  • "You can try to move condition into the CASE expression"
  • "I think I would recommend"
  • "I think you can do below"

Don't blame the members, you didn't give enough useful details so all they can do is guess.

One tip anyway: check your indexes, especially on the fields that are JOINed together. Since you are using SQL Server you can have a look at this guide for example: Make Sure All JOIN Columns are Indexed . This advice is equally valid for other relational DBMSes, although each has its own query optimizer and peculiarities.

I am actually surprised that nobody asked you about your indexes. Very often, it is indeed possible to rewrite an existing SQL statement in a way that makes it more performant. But you have to go to the root of the problem and consider the lower layers too, that is the data.

In simplified terms, if you have no index, and barring any cache or optimization tips, the database engine has to do a full table scan to get the results. If you have lots of records, that obviously takes time. If you join tables, the load further increases. The solution: good table structure, a sound data model and proper indexes.

An index is not a magic bullet though, if you have a poor table structure it will not be as effective as it should be.

If that still doesn't help, then I would suggest that you either add more details or post the question again later with all the relevant details. As a guideline: How do I ask a good question?

thtygnil

thtygnil6#

UNION command is a costly operation. Reason:

  1. It has to fully execute one select query
  2. fetch the rows into the cache, then do a deduplicate
  3. Repeat this operation for second query
  4. Now merge both the results of the deduped query and again do a deduplicate on over all query.

The trick here is to make this union avoid due a deduplicate.

Solution:

  1. Apply DISTINCT to both of your SELECT queries. Again BOTH of the select queries.
  2. Instead of UNION have UNION ALL.

This should solve the problem of performance.

相关问题