pandas Python Groupby列和apply函数

r1zhe5dt  于 2023-03-28  发布在  Python
关注(0)|答案(2)|浏览(128)

我有一个数据框架,看起来像这样,其中包含2000-2022年的所有部门和两个会议。

Tm        Conference   Division    W-L%.    Year  
Bills         AFC        East      0.813    2022   
Dolphins      AFC        East      0.529    2022  
Patriots      AFC        East      0.471    2022    
Jets          AFC        East      0.412    2022   
Cowboys       NFC        East      0.706    2022   
Giants        NFC        East      0.559    2022   
Eagles        NFC        East      0.824    2022  
Commanders    NFC        East      0.500    2022

我想按团队、会议和年份分组,并创建一个名为“部门W-L%”的新列,该列将查找特定部门、会议和年份中每个团队的平均W-L%,但我们正在计算的团队除外。我知道查找部门W-L%的公式:df['Division_W-L%'] =(df['W-L%'].sum()- df['W-L%'])/(len(df)-1).
这就是我想要的数据框的样子。例如,对于'Bills',我们将通过(0.529 + 0.471 + 0.412)/3计算分区W-L%,因为这3个团队在同一个会议,分区和年份。

Tm        Conference   Division    W-L%.    Year  Division W-L%
Bills         AFC        East      0.813    2022    0.470667
Dolphins      AFC        East      0.529    2022    0.565333
Patriots      AFC        East      0.471    2022    0.584667
Jets          AFC        East      0.412    2022    0.604333
Cowboys       NFC        East      0.706    2022    0.627667
Giants        NFC        East      0.559    2022    0.676667
Eagles        NFC        East      0.824    2022    0.588333
Commanders    NFC        East      0.500    2022    0.696333

我试着做我上面描述的,这是由这三个类别分组,然后应用该公式的W-L%列,但我继续收到错误。所有的帮助是感激!

n3schb8v

n3schb8v1#

你可以使用transform代替apply。计算组的和,减去当前行的W-L%.,然后除以组的大小减1(因为你想排除行本身):

df['Division W-L%'] = (df.groupby(['Conference', 'Division', 'Year'])['W-L%.']
                         .transform(lambda x: (x.sum() - x) / (len(x) - 1)))

输出:

>>> df
           Tm Conference Division  W-L%.  Year  Division W-L%
0       Bills        AFC     East  0.813  2022       0.470667
1    Dolphins        AFC     East  0.529  2022       0.565333
2    Patriots        AFC     East  0.471  2022       0.584667
3        Jets        AFC     East  0.412  2022       0.604333
4     Cowboys        NFC     East  0.706  2022       0.627667
5      Giants        NFC     East  0.559  2022       0.676667
6      Eagles        NFC     East  0.824  2022       0.588333
7  Commanders        NFC     East  0.500  2022       0.696333
yeotifhr

yeotifhr2#

试试看:

df['Division W-L% NEW'] = df.groupby(['Conference', 'Division', 'Year'])['W-L%.'].transform('sum') - df['W-L%.']
df['Division W-L% NEW'] = df.groupby(['Conference', 'Division', 'Year'])['Division W-L% NEW'].transform(lambda x: x / (len(x)-1))
print(df)

图纸:

Tm Conference Division  W-L%.  Year  Division W-L%  Division W-L% NEW
0       Bills        AFC     East  0.813  2022       0.470667           0.470667
1    Dolphins        AFC     East  0.529  2022       0.565333           0.565333
2    Patriots        AFC     East  0.471  2022       0.584667           0.584667
3        Jets        AFC     East  0.412  2022       0.604333           0.604333
4     Cowboys        NFC     East  0.706  2022       0.627667           0.627667
5      Giants        NFC     East  0.559  2022       0.676667           0.676667
6      Eagles        NFC     East  0.824  2022       0.588333           0.588333
7  Commanders        NFC     East  0.500  2022       0.696333           0.696333

相关问题