I am not sure how I can implement an IF/THEN statement in my code.
In the first query I'm creating a table that 1) has dengue cases, 2) concatenates all lab results, 3) creates a travel variable where if the case travelled it will pull the travel location, if they did not then it will be null. From that table, the second query is keeping the last case per day, so removing all earlier cases per day.
There are some cases that are identical in time but have two different travel values. I would like to keep all data that 1) has the most recent InsertedDateTime (this can have NULL travel values), 2) if they have identical time values, then it will keep the case that has a travel value not equal to "NULL".
Ideally, I would like to add a statement that says, "if rn=1 and travel = “NULL” then delete”. Seeing that I'm new to SQL and code predominately in SAS, I'm not sure where this should go or how it would be correctly written. Thanks in advance for not being rude and understanding I'm new (ya'll are pretty harsh). Also, I have no control over how the servers or databases are named.
Example Data:
PatientFirstName PatientLastName PatientDOB InsertedDateTime FacilityName FacilityAddressCounty EncounterDateFrom Reason eCR reported Lab results Travel
Jane Doe 20000101 2/12/2023 16:19 BHospital OldCounty 2/2/2023 8:20 Dengue Virus Infection (as a diagnosis or active problem),Fever (Temperature = 38°C [100.4°F]) NULL NULL
Jane Doe 20000101 2/12/2023 16:19 BHospital OldCounty 2/2/2023 8:20 Dengue Virus Infection (as a diagnosis or active problem),Fever (Temperature = 38°C [100.4°F]) NULL Mexico
DROP Table IF EXISTS #labs
SELECT ecr_FK, ObservationResultTypeDisplayName,
ObservationResultValueType, ObservationResultValue,
ObservationResultValueUnit
INTO #labs
FROM ecr.ecr.dbo.Results
where InsertedDateTime >= '2021-01-03' and
ObservationResultTypeDisplayName like '%dengue%'
CREATE NONCLUSTERED INDEX IX_Results_FK ON #labs (eCR_FK)
DROP Table IF EXISTS #ReportabilityRules
SELECT *
INTO #ReportabilityRules
FROM ecr.ecr.dbo.ReportabilityRules
CREATE NONCLUSTERED INDEX IX_RR_FK ON #ReportabilityRules (RR_FK)
DROP TABLE IF EXISTS #ALL
select eCR_PK, a.PatientFirstName, a.PatientLastName, a.PatientDOB, a.InsertedDateTime,
a.FacilityName, a.FacilityAddressCounty,a.EncounterDateFrom, s.codeDisplayName
,(SELECT CASE WHEN ROW_NUMBER() OVER (ORDER BY rr2.RR_fk) = 1 THEN rr2.reportabilityRules ELSE ', ' + rr2.reportabilityRules END
FROM #ReportabilityRules rr2 WHERE d.RR_pk=rr2.RR_fk
FOR XML PATH('')) [Reason eCR reported]
,(SELECT CASE WHEN ROW_NUMBER() OVER (ORDER BY l2.eCR_fk) = 1 THEN concat(l2.ObservationResultTypeDisplayName, ' ', l2.ObservationResultValueType, ' ', l2.ObservationResultValue, ' ', l2.ObservationResultValueUnit) ELSE ', ' + concat(l2.ObservationResultTypeDisplayName, ' ', l2.ObservationResultValueType, ' ', l2.ObservationResultValue, ' ', l2.ObservationResultValueUnit) END
FROM #labs l2 WHERE l1.eCR_FK=l2.eCR_FK
FOR XML PATH('')) [Lab results]
,case when s.codeDisplayName like '%travel%' then s.obsValue else NULL END [Travel]
INTO #ALL
from ecr.dbo.ecr a with (NOLOCK)
left join ecr.dbo.RR d on a.eCR_PK=d.eCR_FK
left join #labs l1 on l1.eCR_FK=a.eCR_PK
left join eCR.ecr.dbo.social s on s.eCR_FK=a.eCR_PK
where ConditionDisplayName like '%dengue%' and a.InsertedDateTime >= '2021-01-03'
select distinct PatientFirstName, PatientLastName, PatientDOB, InsertedDateTime, FacilityName, FacilityAddressCounty, EncounterDateFrom, [Reason eCR reported], [Lab results], [Travel]
from
(select *,
ROW_NUMBER () over (partition by patientdob, cast(inserteddatetime as date)
order by inserteddatetime desc) as rn
from #ALL
) as ecr_date
where rn=1
3条答案
按热度按时间kmpatx3s1#
BLUF: The problem is not in the ROW_NUMBER() filter logic in the final select, but in the JOIN logic of the preceding query that feeds the #ALL table.
After a deep dive into your posted code, my understanding is that you are gathering information on who has contracted a Dengue Virus Infection and what countries have these patients recently traveled to.
I see what appears to be three independent one-to-many joins in the query that feeds the
#ALL
temp table -ecr.dbo.RR
,#labs
, andeCR.ecr.dbo.social
. This is likely the primary cause of the duplicate/conflicting rows that you are attempting to resolve.The #labs join can be eliminated outright. Its only use is already contained within the
[Lab results]
subquery. Theecr.dbo.RR
join can also be eliminated from the outer query if you include it as aJOIN
in the[Reason eCR reported]
subquery.This leaves the
eCR.ecr.dbo.social
table. Two columns are referenced from this table -codeDisplayName
andobsValue
. ThecodeDisplayName
column is referenced in the#ALL
query, but the resulting value is not used by the final select. If it were used, I might suggest a STRING_AGG() concatenation like for[Lab results]
and[Reason eCR reported]
. But since it is unused, I believe it can just be eliminated outright.As for
obsValue
, this depends on whether or not there can be more than one record (per patient encounter) withs.codeDisplayName like '%travel%'
and how you need to handle potential multiple eligible obsValue values. One option is to move theTravel
calculation into a subquery like[Reason eCR reported]
and[Lab results]
that calculates an aggregate value (typicallyMAX()
,SUM()
, orSTRING_AGG()
). Another option is to use aOUTER APPLY(SELECT TOP 1 ... ORDER BY...)
to pick at most a single row and use its value.For now, I am guessing that the travel
obsValue
value is a string value, possibly a country name. In that case, if there is more than one matching travel record representing multiple travel destinations, I expect we would want to useSTRING_AGG()
to get them all.The rewritten
#ALL
query might be like the following:I've eliminated all of the one-to-many joins from the outer query, converted the
FOR XML
subqueries to useSTRING_AGG()
, and added a few newlines for easier reading.The above is not tested (we have no posted sample data) and might need some tweaks. In particular, the
STRING_AGG() ... ORDER BY
clauses need to be reviewed and appropriate sort column referenced selected.As for the final select, I believe you shoudl be able to eliminate the
DISTINCT
. I do have a concern about theROW_NUMBER() ...partition by ...
. Sincepatientdob
is not necessarily unique to the patient, I would suggest also includingPatientFirstName
andPatientLastName
as a minimum. Even that is not guaranteed. If you can access a patient ID in your source data, using that would be more reliable.Since
eCR.ecr.dbo.social
is a remote table reference, there may be a need to preselect that data (within the relevant date range) and pull it to a local temp table to avoid excessive cross-server accesses.If you have further clarifications on your requirements, add them to your original post and/or add a comment below and I will look at updating this answer.
tzxcd3kk2#
CASE WHEN expressions can be used to do inline column level "if this then that" type of logic.
Example
Would be
Now I don't understand what you're actually asking is but I think this will help you get there. If you're trying to remove these rows from the table #ALL, then I think it might be best to do your ROW_NUMBER() in the previous query and filter out by these two conditions you have.
lsmepo6l3#
It's really hard to be more specific without example DDL/DML here's an example of that:
I realize this simplifies your question down a lot, but as an example of what I think you're shooting for:
Here we're assigning a Row Number for each row, partition by the date and then ordered by the date and time, unless one, or both, of the columns we care about are null, in which case we treat it as the end of time. Then the outer query only returns the earliest datetime for that date, that has non-nulls in both columns.
Example using OPs example data (assumed data types) provided in the comments (assuming partition on first, last and DOB):