SQL Server Slow query to fill down empty cells with values from a previous non-empty row

jecbmhm3  于 2023-10-15  发布在  其他
关注(0)|答案(3)|浏览(80)

I got this query to fill down empty cells with values from a previous non-empty row but it's taking so long (there are almost 100.000 records).

Hope you can help me with a faster query to achieve this .

These are all the dll with a sample data:

CREATE TABLE #TEST_INSURANCE_PAYMENTS(
     id_num int IDENTITY(1,1),[Provider] varchar(59), [Location] varchar(104),last_update datetime2, [Total_Charge] money
      );
INSERT INTO #TEST_INSURANCE_PAYMENTS
    ([Provider], [Location],[last_update],[Total_Charge])
VALUES
    ('Vimalkumar Veerappan', 'Arizona Heart Specialists',CURRENT_TIMESTAMP,100.0),
    (' ', 'Banner Boswell Medical Center - Inpatient',CURRENT_TIMESTAMP,102.0),
    (' ', 'Arizona Heart Specialists WEST',CURRENT_TIMESTAMP,800.0),
   ('Akash Makkar', 'Arizona Heart Specialists WEST',CURRENT_TIMESTAMP,500.0),
  (' ', 'Pinnacle Vein & Vascular Center Sun City',CURRENT_TIMESTAMP,500.0),
  (' ', 'Abrazo Arizona Heart Hospital - Outpatient',CURRENT_TIMESTAMP,60.0),
  (' ', 'Banner Boswell Medical Center - Inpatient',CURRENT_TIMESTAMP,60.0),
  (' ', 'Banner Del E Webb Medical Center - Inpatient',CURRENT_TIMESTAMP,10.0)
select id_num,[Provider],[Location],[Total_Charge],[last_update]
from #TEST_INSURANCE_PAYMENTS where IsNull([Provider], '') <> ''
union
select y.id_num, x.[Provider], y.[Location],y.[Total_Charge],y.[last_update]
from #TEST_INSURANCE_PAYMENTS as x
join(
select t1.id_num,max(t2.id_num) as MaxID, t1.[Location],t1.[Total_Charge],t1.[last_update]
from (select * from #TEST_INSURANCE_PAYMENTS where IsNull([Provider], '') = '') as t1
join (select * from #TEST_INSURANCE_PAYMENTS where IsNull([Provider], '') <> '') as t2 on t1.id_num > t2.id_num
group by t1.id_num,  t1.[Location],t1.[Total_Charge],t1.[last_update]
)as y on x.id_num = y.MaxID order by id_num;
ifsvaxew

ifsvaxew1#

There are lot of ways of doing this:

Old skool:

SELECT  id_num
,   CASE WHEN provider = '' THEN (SELECT    TOP 1 Provider FROM #TEST_INSURANCE_PAYMENTS tt WHERE   tt.id_num < t.id_num AND Provider <> '' ORDER BY id_num DESC) ELSE provider END
,   location, last_update, Total_Charge
FROM    #TEST_INSURANCE_PAYMENTS t

This fetches the last non-empty provider from previous rows if current one is empty. Downside is extra join.

New hotness:

SELECT  id_num
,   CASE 
        WHEN provider = '' THEN STUFF(MAX(RIGHT(CONCAT('0000000000',id_num), 10) + CASE WHEN Provider <> '' THEN Provider END) OVER(ORDER BY id_num), 1, 10, '')
        ELSE provider END
,   location, last_update, Total_Charge
FROM    #TEST_INSURANCE_PAYMENTS t

I don't know the name of this technique but what it does is creates a single frame of combination of id_num + provider, like: 0000000001Vimalkumar Veerappan. Where provider is empty, we change it to NULL, so value becomes NULL.

By taking MAX(...) OVER (ORDER BY id_num) call, we fetch the last value in that frame. Then, by doing STUFF(...) we remove the 0000000001 part, and what's left is what we need.

This technique enables a single windowed call, which might generate best performance, downside is somewhat obscure code.

i34xakig

i34xakig2#

We won't be able to tell what's going on without a plan (as Dale suggests). In the mean time, maybe this will be a little better:

SELECT x.id_num, COALESCE(NULLIF(x.Provider,' '),y.provider) AS [Provider], x.Location, x.last_update, x.Total_Charge
  FROM @TEST_INSURANCE_PAYMENTS x
    OUTER APPLY (SELECT MAX(id_num) AS mid_num FROM @TEST_INSURANCE_PAYMENTS y WHERE y.id_num < x.id_num AND y.[Provider] IS NOT NULL AND y.Provider <> ' ' AND x.Provider = '') a
    LEFT OUTER JOIN @TEST_INSURANCE_PAYMENTS y
      ON a.mid_num = y.id_num;
id_numProviderLocationlast_updateTotal_Charge
1Vimalkumar VeerappanArizona Heart Specialists2023-10-04 15:55:59.6833333100.00
2Vimalkumar VeerappanBanner Boswell Medical Center - Inpatient2023-10-04 15:55:59.6833333102.00
3Vimalkumar VeerappanArizona Heart Specialists WEST2023-10-04 15:55:59.6833333800.00
4Akash MakkarArizona Heart Specialists WEST2023-10-04 15:55:59.6833333500.00
5Akash MakkarPinnacle Vein & Vascular Center Sun City2023-10-04 15:55:59.6833333500.00
6Akash MakkarAbrazo Arizona Heart Hospital - Outpatient2023-10-04 15:55:59.683333360.00
7Akash MakkarBanner Boswell Medical Center - Inpatient2023-10-04 15:55:59.683333360.00
8Akash MakkarBanner Del E Webb Medical Center - Inpatient2023-10-04 15:55:59.683333310.00
6bc51xsx

6bc51xsx3#

For Azure or SQL Server 2022, you can just use LAG() with IGNORE NULLS

This avoids JOINS correlated subqueries, etc, etc, and only scans the table once.

SELECT
  *,
  LAG(
    NULLIF(Provider, ' '),
    0
  )
    IGNORE NULLS
    OVER (
      ORDER BY id_num
    )
FROM
  #TEST_INSURANCE_PAYMENTS
id_numProviderLocationlast_updateTotal_Charge(No column name)
1Vimalkumar VeerappanArizona Heart Specialists2023-10-04 22:41:21.590100.0000Vimalkumar Veerappan
2Banner Boswell Medical Center - Inpatient2023-10-04 22:41:21.590102.0000Vimalkumar Veerappan
3Arizona Heart Specialists WEST2023-10-04 22:41:21.590800.0000Vimalkumar Veerappan
4Akash MakkarArizona Heart Specialists WEST2023-10-04 22:41:21.590500.0000Akash Makkar
5Pinnacle Vein & Vascular Center Sun City2023-10-04 22:41:21.590500.0000Akash Makkar
6Abrazo Arizona Heart Hospital - Outpatient2023-10-04 22:41:21.59060.0000Akash Makkar
7Banner Boswell Medical Center - Inpatient2023-10-04 22:41:21.59060.0000Akash Makkar
8Banner Del E Webb Medical Center - Inpatient2023-10-04 22:41:21.59010.0000Akash Makkar

fiddle

相关问题