pandas 使用列值计算百分位数

txu3uszq  于 2022-12-25  发布在  其他
关注(0)|答案(2)|浏览(199)

首先,如果我在写这篇文章的时候对任何错误发表评论,对不起,英语不是我的第一语言。
所以我开始用Python研究体育数据科学和数据可视化,只是出于爱好,我在这方面真的是个初学者。我想根据最高值计算每列的百分位数,我会在下面放一张图片,比如在"xg"列,最高值是1.03,我想将此值转换为值为100%的新列。在其他列中依此类推

我想做这样的事情:
[The stat/与所有行相比的stat百分比]


I

ycl3bljg

ycl3bljg1#

您可以简单地根据每列的最大值计算百分位值,如下所示:

import pandas as pd

df = pd.DataFrame({
    'ID': [1, 2, 3, 4, 5, 6, 7],
    'xg': [0.25, 0.77, 1.03, 0.12, 0.66, 0.79, 0.92],
    'passes': [15, 19, 22, 26, 23, 12, 31],
    'pass_completion': [80, 73, 66, 74, 92, 50, 70],
    'progression': [7, 5, 12, 5, 8, 4, 14],
})

"""
   ID    xg  passes  pass_completion  progression
0   1  0.25      15               80            7
1   2  0.77      19               73            5
2   3  1.03      22               66           12
3   4  0.12      26               74            5
4   5  0.66      23               92            8
5   6  0.79      12               50            4
6   7  0.92      31               70           14
"""

# Following code is what you want to do
df['xg_percentile'] = df['xg']/max(df['xg'])
df['passes_percentile'] = df['passes']/max(df['passes'])
df['pass_completion_percentile'] = df['pass_completion']/max(df['pass_completion'])
df['progression_percentile'] = df['progression']/max(df['progression'])

print(df)
ID    xg  passes  pass_completion  progression  xg_percentile  passes_percentile  pass_completion_percentile  progression_percentile
0   1  0.25      15               80            7       0.242718           0.483871                    0.869565                0.500000
1   2  0.77      19               73            5       0.747573           0.612903                    0.793478                0.357143
2   3  1.03      22               66           12       1.000000           0.709677                    0.717391                0.857143
3   4  0.12      26               74            5       0.116505           0.838710                    0.804348                0.357143
4   5  0.66      23               92            8       0.640777           0.741935                    1.000000                0.571429
5   6  0.79      12               50            4       0.766990           0.387097                    0.543478                0.285714
6   7  0.92      31               70           14       0.893204           1.000000                    0.760870                1.000000
h5qlskok

h5qlskok2#

可以使用Pandas.数据框.排名函数pandas.DataFrame.rank

import pandas as pd

data_dict = {
    "xg":[0.25,0.77,1.03,0.12,0.66,0.79,0.92],
    "passes":[15,19,22,26,23,12,31],
    "passCompletion":[80,72,66,74,92,50,70],
    "progression":[7,5,12,5,8,4,14]}

df = pd.DataFrame(data_dict)
df['xg_pctile'] = df.xg.rank(pct = True)

相关问题