pandas 如果另一列为最小值,则合并行并对列求平均值

mcvgt66p  于 2023-02-07  发布在  其他
关注(0)|答案(5)|浏览(104)

我有一个Pandas数据框:

Server  Clock 1  Clock 2  Power   diff
0  PhysicalWindows1     3400   3300.0   58.5  100.0
1  PhysicalWindows1     3400   3500.0   63.0  100.0
2  PhysicalWindows1     3400   2900.0   25.0  500.0
3  PhysicalWindows2     3600   3300.0   83.8  300.0
4  PhysicalWindows2     3600   3500.0   65.0  100.0
5  PhysicalWindows2     3600   2900.0   10.0  700.0
6    PhysicalLinux1     2600      NaN    NaN    NaN
7    PhysicalLinux1     2600      NaN    NaN    NaN
8              Test     2700   2700.0   30.0    0.0

基本上,我希望平均每台服务器的功耗,但前提是差异最小。例如,如果您查看“PhysicalWindows 1”服务器,我有3行,其中两行的差异为100,一行的差异为500。由于我有两行的差异为100,因此我希望平均58.5和63.0的功耗。对于“PhysicalWindows 2”,因为只有一行具有最小的差异,所以我们返回这一行的幂-65。2如果是NaN,则返回Nan,如果只有一个匹配,则返回这一个匹配的幂。
生成的 Dataframe 如下所示:

Server  Clock 1            Power  
0  PhysicalWindows1     3400    (58.5+63.0)/2
1  PhysicalWindows2     3600             65.0
2    PhysicalLinux1     2600              NaN
3              Test     2700             30.0
ltskdhd1

ltskdhd11#

groupbydropna=False一起使用,以避免删除PhysicalLinux1和sort=True以排序索引级别(最低差异在顶部),然后删除drop_duplicates以仅保留(服务器,时钟1)的一个示例:

out = (df.groupby(['Server', 'Clock 1', 'diff'], dropna=False, sort=True)['Power']
         .mean().droplevel('diff').reset_index().drop_duplicates(['Server', 'Clock 1']))

# Output
             Server  Clock 1  Power
0    PhysicalLinux1     2600    NaN
1  PhysicalWindows1     3400  60.75
3  PhysicalWindows2     3600  65.00
6              Test     2700  30.00
vc9ivgsu

vc9ivgsu2#

以下是使用df.groupby()pd.merge()的可能解决方案

grp_df = df.groupby(['Server', 'diff'])['Power'].mean().reset_index()
grp_df = grp_df.groupby('Server').first().reset_index()
grp_df = grp_df.rename(columns={'diff': 'min_diff', 'Power': 'Power_avg'})

df_out = (pd.merge(df[['Server', 'Clock 1']].drop_duplicates(), grp_df, on='Server', how='left')
            .drop(['min_diff'], axis=1))
print(df_out)
Server  Clock 1  Power_avg
0  PhysicalWindows1     3400      60.75
1  PhysicalWindows2     3600      65.00
2    PhysicalLinux1     2600        NaN
3              Test     2700      30.00
ax6ht2ek

ax6ht2ek3#

使用双groupby,第一个groupby.transform用于屏蔽非最大功率,然后groupby.agg用于聚合

m = df.groupby('Server')['diff'].transform('min').eq(df['diff'])

(df.assign(Power=df['Power'].where(m))
 .groupby('Server', sort=False, as_index=False)
 .agg({'Clock 1': 'first', 'Power': 'mean'})
)

输出:

Server  Clock 1  Power
0  PhysicalWindows1     3400  60.75
1  PhysicalWindows2     3600  65.00
2    PhysicalLinux1     2600    NaN
3              Test     2700  30.00
qyyhg6bp

qyyhg6bp4#

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "Server": ['PhysicalWindows1', 'PhysicalWindows1', 'PhysicalWindows1', 'PhysicalWindows2', 
                   'PhysicalWindows2', 'PhysicalWindows2', 'PhysicalLinux1', 'PhysicalLinux1', 'Test'],
        "Clock 1": [3400, 3400, 3400, 3600, 3600, 3600, 2600, 2600, 2700],
        "Clock 2": [3300.0, 3500.0, 2900.0, 3300.0, 3500.0, 2900.0, np.nan, np.nan, 2700.0],
        "Power": [58.5, 63.0, 25.0, 83.8, 65.0, 10.0, np.nan, np.nan, 30.0],
        "diff": [100.0, 100.0, 500.0, 300.0, 100.0, 700.0, np.nan, np.nan, 0.0]
    }
)

r = (df.groupby(['Server'])
       .apply(lambda d: d[d['diff']==d['diff'].min()])
       .reset_index(drop=True)
       .groupby(['Server'])
       .agg({"Clock 1":'mean', "Power":'mean', "diff":'first'})
       .reset_index()
     )

r = (r.append(df[df['diff'].isnull()]
                           .drop_duplicates()
                           ).drop(['Clock 2', 'diff'], axis=1)
                            .reset_index(drop=True)
                            )

print(r)
Server  Clock 1  Power
0  PhysicalWindows1   3400.0  60.75
1  PhysicalWindows2   3600.0  65.00
2              Test   2700.0  30.00
3    PhysicalLinux1   2600.0    NaN
nimxete2

nimxete25#

def function1(dd:pd.DataFrame):
    dd1=dd.loc[dd.loc[:,"diff"].eq(dd.loc[:,"diff"].min())]
    return pd.DataFrame({"Clock 1":dd1[["Clock 1"]].min().squeeze(),"Power":dd1.Power.mean()},index=[dd.name])

df1.groupby('Server',sort=False).apply(function1).droplevel(1)

出局

Clock 1  Power
Server                          
PhysicalWindows1   3400.0  60.75
PhysicalWindows2   3600.0  65.00
PhysicalLinux1        NaN    NaN
Test               2700.0  30.00

相关问题