numpy 如何在DataFrame.describe中考虑权重?[副本]

wwwo4jvm  于 2023-08-05  发布在  其他
关注(0)|答案(1)|浏览(63)

此问题在此处已有答案

pandas describe by - additional parameters(3个答案)
22天前关闭
我有这样一个样本,学生的分数和人口的分数:

# Create the DataFrame
sample = pd.DataFrame(
{'score':[595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583,582, 581, 580, 579, 578, 577, 576], 
'population':[ 705,  745,  716,  742,  722,  746,  796,  750,  816,  809,  815,821,  820,  865,  876,  886,  947,  949, 1018,  967]})

字符串
我计算它的加权平均分数:

np.average(sample['score'], weights=sample['population'])
# 584.9062443219672


然而,当我运行sample.describe()时,它没有考虑权重:

sample.describe()

           score   population
count   20.00000    20.000000
mean   585.50000   825.550000
std      5.91608    91.465539
min    576.00000   705.000000
25%    580.75000   745.750000
50%    585.50000   815.500000
75%    590.25000   878.500000
max    595.00000  1018.000000


如何获取sample.describe()中包含的权重?

vmdwslir

vmdwslir1#

你需要自定义函数,因为输出是标量,在所有列中获得相同的值:

def describe(df, stats):
    d = df.describe()
    d.loc[stats] = np.average(df['score'], weights=df['population'])
    return d

out = describe(sample, 'wa')
print (out)
            score   population
count   20.000000    20.000000
mean   585.500000   825.550000
std      5.916080    91.465539
min    576.000000   705.000000
25%    580.750000   745.750000
50%    585.500000   815.500000
75%    590.250000   878.500000
max    595.000000  1018.000000
wa     584.906244   584.906244

字符串

相关问题