SQL Server How to aggregate information from indefinite number of groups

r1zk6ea1  于 2023-06-04  发布在  其他
关注(0)|答案(2)|浏览(91)

How to aggregate information from indefinite number of groups in TSQL? E.g. we have a table with 2 columns - clients and regions.

Clients Regions
client1 45
client1 45
client1 45
client1 45
client1 43
client1 42
client1 41
client2 45
client2 45
client3 43
client3 43
client3 41
client3 41
client3 41
client3 41

Every client can have any number of regions.

In the example below: client1 has 4 groups of regions, 2nd - 1 group, 3rd - 2 groups.

I want to count gini impurity for each client, i.e. to count - how different are the regions in client.

To do this, I want to apply to each client the following formula:

But the quantity of regions is indefinite (may differ in each client).

This should be calculated:

client1 = 1 - ((4 / 7 ) ^ 2 + (1 / 7 ) ^ 2 + (1 / 7 ) ^ 2  + (1 / 7 ) ^ 2)
client2 = 1 - ((2 / 2 ) ^ 2)
client3 = 1 - ((2 / 6 ) ^ 2 +  (4 / 6 ) ^ 2)

This is the desirable output:

Clients Impurity
client1 61%
client2 0%
client3 44%

Would you prompt me the way to solve the problem.

ljsrvy3e

ljsrvy3e1#

I think the formula could be expressed as a couple of group by:

WITH cte AS (
    SELECT Clients
         , CAST(COUNT(*) AS DECIMAL(10, 0)) / SUM(COUNT(*)) OVER(PARTITION BY Clients) AS tmp
    FROM t
    GROUP BY Clients, Regions
)
SELECT Clients
     , 100 * (1 - SUM(tmp * tmp)) AS GI
FROM cte
GROUP BY Clients

db<>fiddle seems to match expected output.

j91ykkif

j91ykkif2#

Here's how I'd approach this:

  • in a sub-sub-query, do a count(*) as cnt ... group by clients, regions
  • in a sub-query, do a cast(cnt as float)/sum(cnt) over(partition by clients) as pcnt and square it
  • in the outer query do a 1 - sum(pcnt) ... group by clients

There are ways to compact it to not use 2 subqueries but they might not make it more readable or easy to understand. I wasn't totally clear on whether you meant percent (out of 100) or ratio (out of 1) so you might have to add a *100 at an appropriate point

相关问题