Make SQL Server CLR aggregate similar to native aggregates

j2cgzkjk  于 2023-03-17  发布在  SQL Server
关注(0)|答案(2)|浏览(122)

I'm comparing my custom CLR aggregate vs AVG (SQL Server 2017). My queries are:

SELECT groupId, Helpers.CustomCLR(value)
  FROM table
 group by groupId

  SELECT groupId, AVG(value)
  FROM table
 group by groupId

And CLR is

[Serializable]
    [SqlUserDefinedAggregate(
        Format.Native, //use clr serialization to serialize the intermediate result  
        IsInvariantToNulls = true, //optimizer property  
        IsInvariantToDuplicates = false, //optimizer property  
        IsInvariantToOrder = true)
]
    [StructLayout(LayoutKind.Sequential)]
    public class CustomCLR
    {
        float a = 2;
        public void Init()
        {
        }

        public void Accumulate(SqlSingle value)
        {
        }

        public void Merge(CustomCLR other)
        {
        }

        public double? Terminate()
        {
            return a;
        }
    }

Execution plans are quite different, though. CLR query does row mode sorting and AVG query does batch mode hash match. How to make CLR aggregate behave like AVG one?

eit6fx6z

eit6fx6z1#

There are definitely some unfortunate differences between built-in and SQLCLR User-Defined Aggregate functions (UDA). One of them should be that SQLCLR cannot do batch mode. I will see if I can find an authoritative reference for this.

Another difference is that SQLCLR UDAs do not support the HashAggregate operator, resulting in:

CLR Aggregate performs an expensive sort

I just tested again in SQL Server 2017 CU 12 and SQL Server 2019 CTP 2.2 and it is still an issue. Please see Bob Beauchemin's post (in that linked forum thread) on Thursday, December 9, 2010 for some suggested work-arounds.

Please also support Bob's enhancement request to allow SQLCLR UDAs to use OPTION(HASH GROUP) :

Allow OPTION(HASH GROUP) with SQLCLR UDAs

wb1gzix0

wb1gzix02#

I think Microsoft SQL Server Team may have change some little things on SQL Server 2022 (or may be it work already like this in 2019 ... I didn't test on 2019) but

on a 2022 Dev Edition, I found that you can speed up CLR UDF using some trick when you have to use the CLR UDF in a query that use an aggregation.

In my case i have a hash function that return an integer hash given a varbinary.

Code is often clearer than long explain :

select 
[DT_PERIODE], 
[SQLHash].[dbo].[XF_HashMurmur2_32](cast([ID_ARTICLE] as varbinary(8000))), 
SUM( [VB_MM2_SOC])
FROM [dbo].[TEST_71_1M_12m]
GROUP BY 
[DT_PERIODE], 
[ID_ARTICLE];

This above query is 8 times faster that the following one

select 
 [DT_PERIODE], 
[SQLHash].[dbo].[XF_HashMurmur2_32](cast([ID_ARTICLE] as varbinary(8000))), 
SUM( [VB_MM2_SOC])
FROM [dbo].[TEST_71_1M_12m]
GROUP BY 
[DT_PERIODE], 
[SQLHash].[dbo].[XF_HashMurmur2_32](cast([ID_ARTICLE] as varbinary(8000)));

The CLR cannot use batch mode but in the first query the aggregation is done first in batch mode, only the last compute scalar node is in row mode. In the second query, the columnstore is read in batch mode but just after that the computing scalar node switch in row mode and the batch mode is switched back in the hast match aggregate node.

The elasped times :

  • Query 1 : CPU time = 172 ms, elapsed time = 168 ms. (97651 rows affected)
  • Query 2 : CPU time = 891 ms, elapsed time = 996 ms. (97651 rows affected)

相关问题