SQL Server Query for calculating percentile based on average

uqzxnwby  于 2023-11-16  发布在  其他
关注(0)|答案(1)|浏览(112)

I have a group of athletes and I would like to calculate where their scores land from a percentile perspective. Doing so based on their best score doesn't make sense, but doing so based on their average is closer to the mark.

So let's assume this data:
| athleteId | points |
| ------------ | ------------ |
| 1 | 100 |
| 1 | 200 |
| 1 | 300 |
| 1 | 400 |
| 1 | 500 |
| 2 | 101 |
| 2 | 202 |
| 2 | 303 |
| 2 | 404 |
| 2 | 505 |
| 3 | 10 |
| 3 | 20 |
| 3 | 30 |
| 3 | 40 |
| 3 | 50 |

Using this data, I can execute this simple query to calculate their average points:

SELECT
    t.athleteId,
    AVG(t.points) AS average
FROM table t
GROUP BY t.athleteId

And I will get a result of:

athleteIdaverage
1300
2303
330

Using those averages, I now want to calculate the athlete's percentile based on their average score against the original table. To make things a bit more complicated (since this query is part of a bigger picture and I can't lose sight of that), I would also like to know the athlete's best score, the total number of scores, and how their best score ranked. Therefore, I would expect the following results:

athleteIdaveragebestScoretotalScoresrankpercentile
13005001520.60
23035051510.67
3305015110.13

I need the SQL Server query that can produce that result.

23c0lvtd

23c0lvtd1#

I can't get your exact results. The data you set out is quite "santiary", in that each athlete's average score is also one of their actual scores, so the "percentile of average score" neatly resolves itself when we eyeball the data.

This solution implements a single query with CTE's - if you're writing a stored proc then you might want to use temp tables, and a variable for the total row count.

Anyway, here we go.

I'll declare a table variable to hold the data in this example.

declare @table table (athleteId int, points int)

insert into @table values 
  (1, 100), (1, 200), (1, 300), (1, 400), (1, 500)
, (2, 101), (2, 202), (2, 303), (2, 404), (2, 505) 
, (3, 10), (3, 20), (3, 30) ,(3, 40), (3, 50)

Now express the various CTE's.

The first one adds the rank of each score

; with ranks as
(
    select 
        athleteId
        ,points
        ,RANK() over (order by points desc) pointRank
    from @table
)

.. this one throws the total number of scores in ..

,ranksAndRowCount as
(
    select *, (select COUNT(*) from ranks) totalScores from ranks
)

.. this one calculates the averages ..

Note the cast to float, which will force a non-integer calculation later when we determine the percentile.

,averages as 
(
    SELECT
        t.athleteId,
        AVG(t.points) AS average,
        MAX(t.points) as bestScore,
        min(t.pointRank) as bestScoreRank,
        cast(min(t.totalScores) as float) as totalScores
    FROM ranksAndRowCount t
    GROUP BY t.athleteId
)

.. so far, so good.

Then I had difficulty with the percentile. I have taken the athletes calculated average and added it to the total list of scores, performed a rank, pulled out the rank for the average score, and then divided that rank by the total row count + 1.

select   athleteId
        ,average
        ,bestScore
        ,bestScoreRank
        ,totalScores,
         (
            select   avg(pointsInclAverageRanked.pcrank) 
              from   (
                        select   points


                                ,pcrank = rank() over (order by points) 
                          from   (  
                                    select   points 
                                     -- need to put the average back into the original scores
                                      from   @table union select a.average
                                  ) pointsInclAverage
                     ) pointsInclAverageRanked 
              where  pointsInclAverageRanked.points = a.average
         )
          -- there are more total scores, because we added in the average
          / (1.0+totalScores) averageScorePercentile
from averages a

It's not quite the same result as you were looking for, but it gets close.

athleteIdaveragebestScorebestScoreRanktotalScoresaverageScorePercentile
13005002150.625
23035051150.6875
3305011150.1875

Hope this helps you along your way.

相关问题