Return 1 row from multiple joins to same table - SQL Server

gzjq41n4  于 2023-03-11  发布在  SQL Server
关注(0)|答案(4)|浏览(179)

I have these tables:

CREATE TABLE staff
(
    [case_id] int, 
    [staff] varchar(11), 
    [stafftype] varchar(10)
);
    
INSERT INTO staff
    ([case_id], [staff], [stafftype])
VALUES
    (1, 'Daffy, Duck', 'Primary'),
    (1, 'Bugs, Bunny', 'Additional'),
    (1, 'Elmer, Fudd', 'Additional'),
    (2, 'Daffy, Duck', 'Primary'),
    (2, 'Bugs, Bunny', 'Additional');

CREATE TABLE cases
(
    [case_id] int, 
    [casedate] datetime, 
    [caselocation] varchar(4)
);
    
INSERT INTO cases
    ([case_id], [casedate], [caselocation])
VALUES
    (1, '2023-01-01 00:00:00', 'Home'),
    (2, '2023-01-03 00:00:00', 'Away');

And I want to return a single row per case, there can only be 1 Primary stafftype and a max of 2 Additional stafftypes.

e.g. Result set for case_id = 1
| case_id | caselocation | PrimaryStaff | AdditionalStaff1 | AdditionalStaff2 |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | Home | Daffy, Duck | Bugs, Bunny | Elmer, Fudd |

SQL Fiddle Demo: http://sqlfiddle.com/#!18/83bbc9/6

hgb9j2n6

hgb9j2n61#

Provided only 2 arbitrary 'Additional' stafftypes are required a plain group by will do.

select 
   a.case_id
   ,casedate
   ,caselocation
   ,b.staff as PrimaryStaff
   ,AdditionalStaff1, AdditionalStaff2
from cases a
inner join staff b on a.case_id = b.case_id and b.stafftype = 'Primary'
inner join (
  select case_id, min([staff]) AdditionalStaff1
     , case when count(*) > 1 then max([staff]) end AdditionalStaff2
  from staff
  where stafftype = 'Additional'
  group by case_id
) c on c.case_id = a.case_id
htzpubme

htzpubme2#

One way to do this is conditional aggregation.

Fiddle: http://sqlfiddle.com/#!18/83bbc9/10/0

with cte as (
select a.case_id
       ,casedate
       ,caselocation
       ,b.staff
       ,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
  )
 SELECT case_id
        ,casedate
        ,caselocation
        ,MAX(CASE WHEN RN = 1 THEN staff END) PrimaryStaff
        ,MAX(CASE WHEN RN = 2 THEN staff END) AdditionalStaff1
        ,MAX(CASE WHEN RN = 3 THEN staff END) AdditionalStaff2
 FROM cte
 GROUP BY case_id
        ,casedate
        ,caselocation

The cte assigns the 'Primary' stafftype a row number RN of 1, then assigns 2 and 3 to the secondary stafftypes. Then the final select uses the RN to organize the data, and the MAX() and GROUP BY combine to roll the data up into single rows.

You can select only the inner portion of the CTE, then remove the MAX() and GROUP BY in the bottom query and run that to see what is happening step by step.

ie. Run this:

select a.case_id
       ,casedate
       ,caselocation
       ,b.staff
       ,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id

Then this:

with cte as (
select a.case_id
       ,casedate
       ,caselocation
       ,b.staff
       ,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
  )
 SELECT case_id
        ,casedate
        ,caselocation
        ,CASE WHEN RN = 1 THEN staff END PrimaryStaff
        ,CASE WHEN RN = 2 THEN staff END AdditionalStaff1
        ,CASE WHEN RN = 3 THEN staff END AdditionalStaff2
 FROM cte
mzmfm0qo

mzmfm0qo3#

There are other great answers in this thread, and in many engineering decisions what is the "best" solution will often go by an "it depends" rule. However, I also think this is a small enough an example for a common enough architectural decision point that I think it's worth showing some other "ways" it can be solved. I had considered using STRING_AGG and then splitting back out as a sample of one way to do this. This requires some SUBSTRING + PATINDEX logic:

Option 1:

/* Use a STRING_AGG() and "substring parsing" solution */
SELECT 
c.case_id,
c.caselocation,
f1.PrimaryStaff,
f1.AdditionalStaffList,
f2.*
, IIF(ISNULL(f2.AdditionalListLength,0)   >0,  LEFT(f1.AdditionalStaffList,ISNULL(NULLIF(f2.AdditionalListSubIndex,0)-1,f2.AdditionalListLength)) , NULL) [AdditionalStaff1]
, IIF(ISNULL(f2.AdditionalListSubIndex,0) >0,  RIGHT(f1.AdditionalStaffList,f2.AdditionalListLength-f2.AdditionalListSubIndex) , NULL) [AdditionalStaff2]
FROM
[cases] c
CROSS APPLY (
    SELECT STRING_AGG(s.staff,';') WITHIN GROUP (ORDER BY IIF(s.stafftype='Primary',1,2) ASC) StaffList ,
    MAX(IIF(s.stafftype='Primary',LEN(s.staff)+1,0)) AdditionalStaffSubIndex
    FROM [staff] s WHERE s.[case_id] = c.[case_id] ) s12
OUTER APPLY (
    SELECT LEFT(s12.StaffList,s12.AdditionalStaffSubIndex-1) PrimaryStaff,
    SUBSTRING(s12.StaffList,s12.AdditionalStaffSubIndex+1,LEN(s12.StaffList)) AdditionalStaffList
) f1
OUTER APPLY (
    SELECT PATINDEX('%;%',f1.AdditionalStaffList) AdditionalListSubIndex,
    LEN(f1.AdditionalStaffList) AdditionalListLength
) f2

But I also realized this is a great example to show some of the power of built in XML processing that is available on all supported versions of SQL server. What this can do is abstract the need for a "row numbering" scheme away from the data query itself. XML ,and - JSON , both provide ways to take "a subset of data from a join or subquery" that may contain "0 or more rows" and turn that subset into a "logically single entity/expression" that can be parsed to get specific information "back out" at the end of (or later in) the query pipeline. This can be sometimes be useful in more complex query scenarios when the underlying "data architecture" is 'fixed'.

A snippet showing the basic premise of what the XML-based query will do "per case id":

--APPLY this:
SELECT s.staff FROM [staff] s WHERE s.[case_id] = 1 AND s.[stafftype] = 'Additional' FOR XML AUTO,ELEMENTS

It's also possible that your data constraints may also look something like this as a way to force "only one primary staff per case" and ensuring that "each staff can only be on a given case_id once":

--Each staff is only ONCE per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_StaffCase ON [staff] ([case_id],[stafftype],[staff])
--Only ONE primary staff per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_StaffCase_Primary ON [staff] ([case_id]) INCLUDE ([staff]) WHERE [stafftype] = 'Primary'

And so, the following XML-based query is straightforward to read, and could be adjusted to only join once. This gives you the option to change to an INNER join on the primary staff:

Option 2:

/* An XML-based query solution */
SELECT 
c.case_id,
c.caselocation,
s1.staff [PrimaryStaff],
s2.staffList.value('(/s/staff)[1]','varchar(11)') [AdditionalStaff1],
s2.staffList.value('(/s/staff)[2]','varchar(11)') [AdditionalStaff2]
FROM
[cases] c
LEFT JOIN [staff] s1 ON s1.[case_id] = c.[case_id] AND s1.[stafftype] = 'Primary'
OUTER APPLY (SELECT CAST((SELECT s.staff FROM [staff] s WHERE s.[case_id] = c.[case_id] AND s.[stafftype] = 'Additional' FOR XML AUTO,ELEMENTS) AS XML) ) s2 (staffList)

Further Reading: Earlier above, I mentioned cases where you need to query a "fixed data architecture". If the architecture is still being designed/developed, you may have options to provide input or adjust how the schema is represented so that queries may be able to be constructed differently in the first place, or to take advantage of additional features of the storage and processing engines of sql server . This shows one possibility of initial schema design (where i have added " zz " and " yy " to names to :a,: prevent naming conflicts against the OP schema if testing on the same database, and :b,: clearly show these objects as testing and development schema items).

As additional context not directly presenting a solution to OP, i'll leave the below without further explanation:

/* Alternative Schema Scenario:*/
CREATE SCHEMA yyStaff
GO
CREATE SCHEMA yyCases
GO

CREATE TABLE yyStaff.zz_staff
(
    [staffid]    INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_Staff UNIQUE      CLUSTERED    
  , [staff_name]       VARCHAR(11) NOT NULL CONSTRAINT PK_Staff PRIMARY KEY NONCLUSTERED WITH (DATA_COMPRESSION = ROW) --eg. 'Daffy, Duck'
) WITH(DATA_COMPRESSION = ROW);

CREATE TABLE yyCases.zz_locationtype
(
    [locationtypeid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_CaseLocationType UNIQUE      CLUSTERED    
  , [location]              VARCHAR(4) NOT NULL CONSTRAINT PK_CaseLocationType PRIMARY KEY NONCLUSTERED WITH (DATA_COMPRESSION = ROW) --eg. ('Home'),('Away')
) WITH(DATA_COMPRESSION = ROW);

CREATE TABLE yyCases.zz_case
(
    [caseid]            INT IDENTITY(1,1) NOT NULL   CONSTRAINT CK_Case UNIQUE CLUSTERED  
  , [casedate]          DATETIMEOFFSET(0) NOT NULL                                         
  , [reflocationtype]                 INT NOT NULL   CONSTRAINT FK_Case_CaseLocationType FOREIGN KEY REFERENCES yyCases.zz_locationtype (locationtypeid)
);

CREATE NONCLUSTERED INDEX IX_Case__CaseLocationType_Date ON yyCases.zz_case (reflocationtype,casedate)           WITH (DATA_COMPRESSION  = ROW);
CREATE NONCLUSTERED INDEX IX_Case__DateICaseLocationType ON yyCases.zz_case (casedate) INCLUDE (reflocationtype) WITH (DATA_COMPRESSION  = ROW);

/* Alternatively, since there are only two, just have an [IsPrimary] column on zz_casestaff ? */
--CREATE TABLE yyCases.zz_stafftype
--(
--    [stafftypeid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_CaseStaffType UNIQUE      CLUSTERED    
--  , [stafftype]         VARCHAR(10) NOT NULL CONSTRAINT PK_CaseStaffType PRIMARY KEY NONCLUSTERED WITH (DATA_COMPRESSION = ROW) --eg. ('Primary'),('Additional')
--) WITH(DATA_COMPRESSION = ROW);
/* yes, see below */

CREATE TABLE yyCases.zz_casestaff
(
    [casestaffid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_CaseStaff        UNIQUE      CLUSTERED 
  , [refcase]                   INT NOT NULL CONSTRAINT FK_CaseStaff_Case   FOREIGN KEY REFERENCES yyCases.zz_case  (caseid)
  , [refstaff]                  INT NOT NULL CONSTRAINT FK_CaseStaff_Staff  FOREIGN KEY REFERENCES yyStaff.zz_staff (staffid)
  , [isprimary]                 BIT NOT NULL
);

--Each staff is only ONCE per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_CaseStaff__Case_Staff    ON yyCases.zz_casestaff (refcase,refstaff) INCLUDE (IsPrimary);
--Only ONE primary staff per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_CaseStaff__CaseFPrimary  ON yyCases.zz_casestaff (refcase) INCLUDE (refstaff) WHERE IsPrimary = 1;
--For query processing: Sort cases by staff, and if primary:
CREATE NONCLUSTERED INDEX IX_CaseStaff_Staff ON yyCases.zz_casestaff (refstaff,isprimary) INCLUDE (refcase)
gudnpqoy

gudnpqoy4#

WITH
  staff_sorted
AS
(
  SELECT
    *,
    ROW_NUMBER()
      OVER (
        PARTITION BY case_id, stafftype
            ORDER BY staff
    )
      AS ordinal
  FROM
    staff
),
  staff_pivotted
AS
(
  SELECT
    case_id,
    MAX(CASE WHEN stafftype = 'Primary'                    THEN staff END) AS staff_primary,
    MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 1 THEN staff END) AS staff_additional_1,
    MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 2 THEN staff END) AS staff_additional_2
  FROM
    staff_sorted
  GROUP BY
    case_id
)  
SELECT
  *
FROM
  cases            AS c
INNER JOIN
  staff_pivotted   AS s
    ON c.case_id = s.case_id

相关问题