SQL Server How to aggregate information from indefinite number of groups

r1zk6ea1 于 2023-06-04 发布在其他

关注(0)|答案(2)|浏览(91)

How to aggregate information from indefinite number of groups in TSQL? E.g. we have a table with 2 columns - clients and regions.

Clients Regions
client1 45
client1 45
client1 45
client1 45
client1 43
client1 42
client1 41
client2 45
client2 45
client3 43
client3 43
client3 41
client3 41
client3 41
client3 41

Every client can have any number of regions.

In the example below: client1 has 4 groups of regions, 2nd - 1 group, 3rd - 2 groups.

I want to count gini impurity for each client, i.e. to count - how different are the regions in client.

To do this, I want to apply to each client the following formula:

But the quantity of regions is indefinite (may differ in each client).

This should be calculated:

client1 = 1 - ((4 / 7 ) ^ 2 + (1 / 7 ) ^ 2 + (1 / 7 ) ^ 2  + (1 / 7 ) ^ 2)
client2 = 1 - ((2 / 2 ) ^ 2)
client3 = 1 - ((2 / 6 ) ^ 2 +  (4 / 6 ) ^ 2)

This is the desirable output:

Clients Impurity
client1 61%
client2 0%
client3 44%

Would you prompt me the way to solve the problem.

sql-server

来源：https://stackoverflow.com/questions/58928264/how-to-aggregate-information-from-indefinite-number-of-groups

2条答案

按热度按时间

ljsrvy3e1#

I think the formula could be expressed as a couple of group by:

WITH cte AS (
    SELECT Clients
         , CAST(COUNT(*) AS DECIMAL(10, 0)) / SUM(COUNT(*)) OVER(PARTITION BY Clients) AS tmp
    FROM t
    GROUP BY Clients, Regions
)
SELECT Clients
     , 100 * (1 - SUM(tmp * tmp)) AS GI
FROM cte
GROUP BY Clients

db<>fiddle seems to match expected output.

赞(0）回复(0）举报 2023-06-04

j91ykkif2#

Here's how I'd approach this:

in a sub-sub-query, do a count(*) as cnt ... group by clients, regions
in a sub-query, do a cast(cnt as float)/sum(cnt) over(partition by clients) as pcnt and square it
in the outer query do a 1 - sum(pcnt) ... group by clients

There are ways to compact it to not use 2 subqueries but they might not make it more readable or easy to understand. I wasn't totally clear on whether you meant percent (out of 100) or ratio (out of 1) so you might have to add a *100 at an appropriate point

赞(0）回复(0）举报 2023-06-04

我来回答

SQL Server How to aggregate information from indefinite number of groups

2条答案

相关问题

热门标签

最新问答