SQL Server Join two tables on multiple columns matching a condition

j8yoct9x  于 2023-08-02  发布在  其他
关注(0)|答案(2)|浏览(95)

I'm trying to build a query for my SQL Server database (but I guess the behavior can be the same for every relational database).

I have two tables that can be related by two columns. They represent a link between two entities.

The first table (example below) means that the entities of the same "folder" (represented by another table that is not relevant in this case) can belong to the same group or the group can be null. There is no way that with the same FOLDER_ID you can have different groups: there can be elements or of the same group or NULL.

Table 1

ID     NAME     GROUP     FOLDER_ID     
-----------------------------------
1      A        Group1    1
2      B        Group1    1
3      C        Group1    1
4      D        NULL      1

Table 2 represents the links between the elements present in the Table 1 and contains values similar to the following example

Table 2

ID     FROM     TO
--------------------
1      1        2
2      1        3
3      2        3
4      1        4
5      2        4
6      4        2

I need to write a query that joins the two table between Table1.id = Table2.from OR Table1.id = Table2.to having the condition that Table1.GROUP IS NOT NULL .

I wrote the following query:

SELECT * 
FROM Table1 t1 
INNER JOIN Table2 t2 ON (t1.id = t2.from OR t1.id = t2.to) 
                     AND t1.group IS NOT NULL

The problem is that the following query extracts all the results that I expect for the column FROM (so the ones where GROUP is not NULL ), but all the elements from column TO that are linked.

See the example of result below

t1.ID     t1.GROUP     t2.FROM     t2.TO
----------------------------------------
1         Group1       1           2
2         Group1       1           3
3         Group1       2           3
4         Group1       1           4
5         Group1       2           4

The query I provided as an example excludes correctly the row where t2.from = 4 (because group of t1.id = 4 is null) but I also want to exclude all the rows where the t2.to is 4 for the same reason that t1.group of 4 is NULL.

How can I achieve that?

Please: I know that probably the database design could not be the best possible for the representation, but there is no way that I can change it at the time being.

I know that probably it is a silly question but I cannot figure it out how to do that and I'm struggling a lot for that, hope that someone can help.

Thanks in advance for the support

cl25kdpy

cl25kdpy1#

Since you already mention in your question you're aware of the design failings and can't change then I'll redact my original opening paragraph ;)

Note your query is not valid without delimiting the reserved words - it's best to avoid reserved keywords eg from, group etc as you'll be forced to always delimit them.

Joins based on an or predicate never perform well as it mitigates effectively using an index - so it's better to union two queries.

You then need to check both the joined from and to match a non-null row, which you can do efficiently with exists:

select *
from (
  select t2.ID, t1.[GROUP], t2.[FROM], t2.[TO]
  from t1 
  join t2 on (t1.Id = t2.[FROM]) 
  union all
  select t2.ID, t1.[GROUP], t2.[FROM], t2.[TO]
  from t1 
  join t2 on ( t1.Id = t2.[TO]) 
)t
where not exists (
  select * from t1 
  where t1.Id in ([FROM], [TO]) and t1.[GROUP] is null
);
eivgtgni

eivgtgni2#

When you use or it's mean when t1.id = t2.from is true , t1.id = t2.to will not check. You need to add where .

where t2.to not in (select id from Table1 where group IS NULL)

相关问题