在pandas Dataframe中按百分比计算组内

dced5bon  于 2023-03-28  发布在  其他
关注(0)|答案(1)|浏览(136)
import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   'status':['yes','yes','no','no','yes','no','no','yes','no','yes'],
                   'month':['m1','m2','m3','m2','m1','m3','m2','m1','m3','m1'],
                   'points': [12, 29, 34, 14, 10, 11, 7, 36, 34, 22]})

#view DataFrame
print(df)

尝试使用以下代码

x = df.groupby(['team','month','status']).size().reset_index().rename(columns={0:'cnt'}).pivot(index='team',columns=['month','status'])
x = x.fillna(0)
x

预期输出在groupby %计算范围内

对于每个月m1或m2或m3,考虑计算yes%和no%。例如
对于m1和A组,是% 100%和否%应为0% m2和A组,是% 100%和否%应为0% m3和A组,是% 50%和否%应为50%,对于B组,类似

zbq4xfa0

zbq4xfa01#

您可以将crosstabnormalize一起使用-但输出不同:

x = pd.crosstab([df['team'], df['month']], df['status'], normalize='index').mul(100)

print (x)
status         no    yes
team month              
A    m1       0.0  100.0
     m2      50.0   50.0
     m3     100.0    0.0
B    m1       0.0  100.0
     m2     100.0    0.0
     m3     100.0    0.0

如果需要新的百分比列:

x = pd.crosstab([df['team'], df['month']], df['status'])

x = pd.concat([x, x.div(x.sum(axis=1), axis=0).add_suffix('_%').mul(100)], axis=1)
print (x)
status      no  yes   no_%  yes_%
team month                       
A    m1      0    2    0.0  100.0
     m2      1    1   50.0   50.0
     m3      1    0  100.0    0.0
B    m1      0    2    0.0  100.0
     m2      1    0  100.0    0.0
     m3      2    0  100.0    0.0

关于SeriesGroupBy.value_counts

x = (df.groupby(['team','month'])['status'].value_counts(normalize=True)
       .unstack('status', fill_value=0)
       .mul(100))
print (x)
status         no    yes
team month              
A    m1       0.0  100.0
     m2      50.0   50.0
     m3     100.0    0.0
B    m1       0.0  100.0
     m2     100.0    0.0
     m3     100.0    0.0

您的输出:
一个一个三个一个一个一个一个一个四个一个一个一个一个一个五个一个

相关问题