SQL Server SQL ensure single result from query or don’t match

niwlg2el  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(95)

I am trying to use SQL to write a matching procedure to match two different healthcare patient data sources. This sample shows two steps, trying to match on an MRN identifier as well as an MBI identifier. The intent is to find a 1:1 patient match on one of the identifiers and SKIP the patient if a 1:1 match cannot be found. I thought the below would work but it’s not. I see why it’s not working now (due to the grouping statement) but I am at a loss for how to execute my intention. Any suggestions on a different setup in CTE expressions or likewise? The full algorithm will have 5-6 steps with different identifiers. I need a patient to “exit the loop” with the first 1:1 match found (per the coalesce statement). If a 1:1 match can’t be found, the patient should be skipped and no match identified, so as not to make the mistake of matching the wrong patients. Hope this makes sense. Here’s what I’ve tried…

WITH base AS (
    SELECT
        *
    FROM tblATB_DataLoad 
    WHERE CREATEDATE > GETDATE() - 1
),
matchMRN AS (
    SELECT bas.ID, pat.PSNNMBR
    FROM            base                    bas
        INNER JOIN  tblMain_Patient_Master  pat ON (bas.ENTITYID = pat.ENTITYID AND LOWER(bas.PSNMRN) = LOWER(pat.PSNMRN))
    GROUP BY bas.ID, pat.PSNNMBR
    HAVING COUNT(bas.ID) = 1
),
matchMBI AS (
    SELECT bas.ID, pat.PSNNMBR
    FROM            base                    bas
        INNER JOIN  tblMain_Patient_Master  pat ON (bas.ENTITYID = pat.ENTITYID AND LOWER(bas.MBI) = LOWER(pat.CLM1ID))
    GROUP BY bas.ID, pat.PSNNMBR
    HAVING COUNT(bas.ID) = 1
),
matchLoad AS (
    SELECT DISTINCT 
         bas.ENTITYID
        ,COALESCE(mrn.PSNNMBR, mbi.PSNNMBR, '') AS PSNNMBR
        ,bas.PROGRAM
    FROM                base                bas
        LEFT OUTER JOIN matchMRN            mrn ON (bas.ID = mrn.ID)
        LEFT OUTER JOIN matchMBI            mbi ON (bas.ID = mbi.ID)
    WHERE COALESCE(mrn.PSNNMBR, mbi.PSNNMBR, '') != ''
)
select
    *
from matchLoad
;
t3psigkw

t3psigkw1#

To fix your preselect CTEs so that they exclude matches when there are multiple for the same bas.ID in the same step, you can add a windowedCOUNT(*) to your initial match, wrap that up as a subquery, and then filter for match count = 1 on the containing query. The GROUP BY and HAVING are dropped.

To implement the "stop after first unique match found" logic, you can put additional conditions on the LEFT JOINs to exclude their results when an earlier join was successful. This will preserve the order of preference, where the first matching join wins.

Try something like:

WITH base AS (
    SELECT
        *
    FROM tblATB_DataLoad 
    WHERE CREATEDATE > GETDATE() - 1  -- Or perhaps: DATEADD(day, -1, CAST(GETDATE() AS DATE))
),
matchMRN AS (
    SELECT M.*
    FROM (
        SELECT
            bas.ID, pat.PSNNMBR,
            COUNT(*) OVER(PARTITION BY bas.ID) AS MatchCount
        FROM base bas
        INNER JOIN tblMain_Patient_Master pat
            ON bas.ENTITYID = pat.ENTITYID
            AND LOWER(bas.PSNMRN) = LOWER(pat.PSNMRN)
    ) M
    WHERE M.MatchCount = 1
),
matchMBI AS (
    SELECT M.*
    FROM (
        SELECT
            bas.ID, pat.PSNNMBR,
            COUNT(*) OVER(PARTITION BY bas.ID) AS MatchCount
        FROM base bas
        INNER JOIN tblMain_Patient_Master pat
            ON bas.ENTITYID = pat.ENTITYID
            AND LOWER(bas.MBI) = LOWER(pat.CLM1ID)
    ) M
    WHERE M.MatchCount = 1
),
matchXXX AS (
    ...
),
matchLoad AS (
    SELECT DISTINCT 
         bas.ENTITYID
        ,COALESCE(mrn.PSNNMBR, mbi.PSNNMBR, '') AS PSNNMBR
        ,bas.PROGRAM
    FROM                base                bas
        LEFT OUTER JOIN matchMRN            mrn ON (bas.ID = mrn.ID)
        LEFT OUTER JOIN matchMBI            mbi ON (bas.ID = mbi.ID)
                                                AND mrn.ID IS NULL
        LEFT OUTER JOIN matchXXX            xxx ON (bas.ID = xxx.ID)
                                                AND mbi.ID IS NULL
                                                AND mrn.ID IS NULL
    WHERE COALESCE(mrn.PSNNMBR, mbi.PSNNMBR, '') != ''
)
select
    *
from matchLoad
;

The above potentially searches of all possible matches in each of the CTEs, even if not used due to ealier CTE matches.

An alternative that sequentially searches for each match would be to use a series of OUTER APPLY s the each search for a match only if the prior ones came up empty.

Something like:

SELECT *
FROM (
    SELECT
        *
    FROM tblATB_DataLoad 
    WHERE CREATEDATE > GETDATE() - 1  -- Or perhaps: DATEADD(day, -1, CAST(GETDATE() AS DATE))
) bas
OUTER APPLY (
    SELECT M.*
    FROM (
        SELECT
            bas.ID, pat.PSNNMBR,
            COUNT(*) OVER(PARTITION BY bas.ID) AS MatchCount
        FROM (SELECT Guard = 1) G -- First case always
        JOIN tblMain_Patient_Master pat
            ON pat.ENTITYID = bas.ENTITYID
            AND LOWER(bas.PSNMRN) = LOWER(pat.PSNMRN)
    ) M
    WHERE M.MatchCount = 1
) matchMRN
OUTER APPLY (
    SELECT M.*
    FROM (
        SELECT
            bas.ID, pat.PSNNMBR,
            COUNT(*) OVER(PARTITION BY bas.ID) AS MatchCount
        FROM ( -- No prior match
            SELECT Guard = 1
            WHERE matchMRN.ID IS NULL 
        ) G
        JOIN tblMain_Patient_Master pat
            ON pat.ENTITYID = bas.ENTITYID
            AND LOWER(pat.CLM1ID) = LOWER(bas.MBI)
    ) M
) matchMBI
OUTER APPLY (
    SELECT M.*
    FROM (
        SELECT
            bas.ID, pat.PSNNMBR,
            COUNT(*) OVER(PARTITION BY bas.ID) AS MatchCount
        FROM ( -- No prior match
            SELECT Guard = 1
            WHERE matchMRN.ID IS NULL 
            AND matchMBI.ID IS NULL 
        ) G
        JOIN tblMain_Patient_Master pat
            ON pat.DOB = bas.DOB
            AND LOWER(pat.LastName) = LOWER(bas.LastName)
            AND LOWER(pat.FirstName) = LOWER(bas.FirstName)
    ) M
) matchNameDOB
...
-- This final join only succeeds if we found a match
JOIN tblMain_Patient_Master pat
    ON pat.PSNNMBR = COALESCE(
        matchMRN.PSNNMBR,
        matchMBI.PSNNMBR,
        matchNameDOB.PSNNMBR,
        ...)
;

The "Guard" parts of OUTER APPLY subqueries will yield either 0 or 1 rows to control wither the following join will execute.

I would suggest testing each for performance, as the all-matches-at-once approach of the first option might actually be better than the selective sequential logic of the second.

相关问题