In SQl server I have two big tables with each one about 500,000,000 records, I expect that these two query produce same result, but not!
select count(*)
from ff
left join d on
ff.ID = d.id
where ff.sendtime < '2023-01-21' and d.id is null
select count(*)
from ff
left join d on
ff.sendtime < '2023-01-21' and ff.ID = d.id
where d.id is null
2条答案
按热度按时间4bbkushb1#
First query
Select all ff rows before 2023-01-21. Then outer join all d rows on the ID. Then dismiss all rows where ff had a match in dd. Thus you select all ff rows before 2023-01-21 for which not exists a d row.
Second query
Select all ff rows, no matter which date. Outer join all d rows based on the ID and the ff date. Then dismiss all rows where ff had a match in dd. Thus you select all ff rows the senddate of which is >= 2023-01-21 or that don't have a match in d.
Anti joins
Anti joins, where you outer join a table and only keep the outer joined rows, are meant to select data where there is no matching data in that table. I consider
NOT EXISTS
orNOT IN
way more readable than the join. These clauses make plain what you want to select and what the condition is for selecting the data, by putting the table of interest in theFROM
clause and the lookup of the other table in theWHERE
clause.This is how I'd write the first query:
This is how I'd write the second query:
9wbgstp72#
In the first query the left join between tables ff and d is performed based on the "ID" column where as in the second it checks "ff.sendtime" is less than given time i.e.., 2023-01-21 and "ID" is matching in two tables or not.