SQL Server Why move where clause to join clause produce different result

bbmckpt7  于 2023-06-21  发布在  其他
关注(0)|答案(2)|浏览(107)

In SQl server I have two big tables with each one about 500,000,000 records, I expect that these two query produce same result, but not!

select count(*)
from ff
         left join d on
    ff.ID = d.id
where ff.sendtime < '2023-01-21' and d.id is null
select count(*)
from ff
         left join d on
    ff.sendtime < '2023-01-21' and ff.ID = d.id
where d.id is null
4bbkushb

4bbkushb1#

First query

Select all ff rows before 2023-01-21. Then outer join all d rows on the ID. Then dismiss all rows where ff had a match in dd. Thus you select all ff rows before 2023-01-21 for which not exists a d row.

Second query

Select all ff rows, no matter which date. Outer join all d rows based on the ID and the ff date. Then dismiss all rows where ff had a match in dd. Thus you select all ff rows the senddate of which is >= 2023-01-21 or that don't have a match in d.

Anti joins

Anti joins, where you outer join a table and only keep the outer joined rows, are meant to select data where there is no matching data in that table. I consider NOT EXISTS or NOT IN way more readable than the join. These clauses make plain what you want to select and what the condition is for selecting the data, by putting the table of interest in the FROM clause and the lookup of the other table in the WHERE clause.

This is how I'd write the first query:

select count(*)
from ff
where sendtime < '2023-01-21'
and not exists (select * from d where d.id = ff.id);

This is how I'd write the second query:

select count(*)
from ff
where sendtime >= '2023-01-21'
or not exists (select * from d where d.id = ff.id);
9wbgstp7

9wbgstp72#

In the first query the left join between tables ff and d is performed based on the "ID" column where as in the second it checks "ff.sendtime" is less than given time i.e.., 2023-01-21 and "ID" is matching in two tables or not.

相关问题