I am trying to use SQL to write a matching procedure to match two different healthcare patient data sources. This sample shows two steps, trying to match on an MRN identifier as well as an MBI identifier. The intent is to find a 1:1 patient match on one of the identifiers and SKIP the patient if a 1:1 match cannot be found. I thought the below would work but it’s not. I see why it’s not working now (due to the grouping statement) but I am at a loss for how to execute my intention. Any suggestions on a different setup in CTE expressions or likewise? The full algorithm will have 5-6 steps with different identifiers. I need a patient to “exit the loop” with the first 1:1 match found (per the coalesce statement). If a 1:1 match can’t be found, the patient should be skipped and no match identified, so as not to make the mistake of matching the wrong patients. Hope this makes sense. Here’s what I’ve tried…
WITH base AS (
SELECT
*
FROM tblATB_DataLoad
WHERE CREATEDATE > GETDATE() - 1
),
matchMRN AS (
SELECT bas.ID, pat.PSNNMBR
FROM base bas
INNER JOIN tblMain_Patient_Master pat ON (bas.ENTITYID = pat.ENTITYID AND LOWER(bas.PSNMRN) = LOWER(pat.PSNMRN))
GROUP BY bas.ID, pat.PSNNMBR
HAVING COUNT(bas.ID) = 1
),
matchMBI AS (
SELECT bas.ID, pat.PSNNMBR
FROM base bas
INNER JOIN tblMain_Patient_Master pat ON (bas.ENTITYID = pat.ENTITYID AND LOWER(bas.MBI) = LOWER(pat.CLM1ID))
GROUP BY bas.ID, pat.PSNNMBR
HAVING COUNT(bas.ID) = 1
),
matchLoad AS (
SELECT DISTINCT
bas.ENTITYID
,COALESCE(mrn.PSNNMBR, mbi.PSNNMBR, '') AS PSNNMBR
,bas.PROGRAM
FROM base bas
LEFT OUTER JOIN matchMRN mrn ON (bas.ID = mrn.ID)
LEFT OUTER JOIN matchMBI mbi ON (bas.ID = mbi.ID)
WHERE COALESCE(mrn.PSNNMBR, mbi.PSNNMBR, '') != ''
)
select
*
from matchLoad
;
1条答案
按热度按时间t3psigkw1#
To fix your preselect CTEs so that they exclude matches when there are multiple for the same
bas.ID
in the same step, you can add a windowedCOUNT(*)
to your initial match, wrap that up as a subquery, and then filter for match count = 1 on the containing query. The GROUP BY and HAVING are dropped.To implement the "stop after first unique match found" logic, you can put additional conditions on the LEFT JOINs to exclude their results when an earlier join was successful. This will preserve the order of preference, where the first matching join wins.
Try something like:
The above potentially searches of all possible matches in each of the CTEs, even if not used due to ealier CTE matches.
An alternative that sequentially searches for each match would be to use a series of
OUTER APPLY
s the each search for a match only if the prior ones came up empty.Something like:
The "Guard" parts of OUTER APPLY subqueries will yield either 0 or 1 rows to control wither the following join will execute.
I would suggest testing each for performance, as the all-matches-at-once approach of the first option might actually be better than the selective sequential logic of the second.