SQL Server Subquery with 'select top 1' returning only NULL values despite table field having no NULLs

a5g8bdjr  于 2023-08-02  发布在  其他
关注(0)|答案(1)|浏览(107)

Why is this sub query bringing back NULL values when the table learner_employment contains no NULL values in the EMPLOYMENT_STATUS field? If I run it as as standalone query on any person code I get values. It seems to be an issue with the 'top 1' command, as there are values if I remove this. However I require only the earliest employment record.

select 
p.PERSON_CODE,
EMPLOYMENT_STATUS,
EMPLOYMENT_INTENSITY,
DATE_STATUS_APPLIES
from people p
left join 
(select top 1 PERSON_CODE, 
EMPLOYMENT_STATUS, 
EMPLOYMENT_INTENSITY, 
DATE_STATUS_APPLIES 
from learner_employment
order by DATE_STATUS_APPLIES) emp 
on emp.PERSON_CODE = p.PERSON_CODE
pcww981p

pcww981p1#

I suspect the problem is your misunderstanding of how the JOIN is working. I'm going to provide a MRE here, as your question lacks one. Firstly some sample data:

CREATE TABLE dbo.SomeTable (SomeID int,
                            SomeValue varchar(10));
GO
CREATE TABLE dbo.AnotherTable (AnotherID int,
                               SomeID int,
                               AnotherDate date);
GO

INSERT INTO dbo.SomeTable (SomeID,
                           SomeValue)
VALUES(1,'abc'),
      (2,'def'),
      (3,'ghi'),
      (4,'xyz');
GO
INSERT INTO dbo.AnotherTable (AnotherID,
                              SomeID,
                              AnotherDate)
VALUES (1, 1, GETDATE()),
       (2, 1, GETDATE() + 1),
       (3, 3, GETDATE() + 4),
       (4, 3, GETDATE() + 2),
       (5, 3, GETDATE() - 1),
       (6, 4, GETDATE() + 3);
GO

Now lets write a query that represents what your does:

SELECT S.SomeID,
       S.SomeValue,
       A.AnotherDate
FROM dbo.SomeTable S
     LEFT JOIN (SELECT TOP (1)
                       sq.SomeID,
                       sq.AnotherDate
                FROM dbo.AnotherTable sq
                ORDER BY sq.AnotherDAte DESC) A ON S.SomeID = A.SomeID;

This returns the following dataset:

SomeIDSomeValueAnotherDate
1abcNULL
2defNULL
3ghi2023-07-15
4xyzNULL

This is expected, the subquery returns one row and then that is LEFT JOIN ed onto. We can see what the subquery would return with the following:

SELECT TOP (1)
       sq.SomeID,
       sq.AnotherDate
FROM dbo.AnotherTable sq
ORDER BY sq.AnotherDate DESC;

Which, unsurprisingly, returns the following:

SomeIDAnotherDate
32023-07-15

This is because SomeID3 has the row with the highest value of AnotherDate and so in the priordata set only SomeID3 has a value in AnotherDate ( 1 , 2 , and 4 aren't equal to 3 so the LEFT JOIN doesn't return a row).

Perhaps what you want is instead of a JOIN is a correlated query. You need to use APPLY for this. As you have a LEFT JOIN , then presumably you need an OUTER APPLY . This would then return the TOP (1) row for each correlation:

SELECT S.SomeID,
       S.SomeValue,
       A.AnotherDate
FROM dbo.SomeTable S
    OUTER APPLY (SELECT TOP (1)
                      sq.AnotherDate
               FROM dbo.AnotherTable sq
               WHERE sq.SomeID = S.SomeID
               ORDER BY sq.AnotherDate DESC) A;

Which returns the following:

SomeIDSomeValueAnotherDate
1abc2023-07-12
2defNULL
3ghi2023-07-15
4xyz2023-07-14

Though if this is your goal,you should just use a MAX (not sure that's the case here mind):

SELECT S.SomeID,
       S.SomeValue,
       MAX(A.AnotherDate) AS AnotherDate
FROM dbo.SomeTable S
    LEFT JOIN dbo.AnotherTable A ON S.SomeID = A.SomeID
GROUP BY S.SomeID,
         S.SomeValue;

Alternatively, you can use the Get top 1 row of each group solution.

相关问题