使用Pandas groupby时是否创建新列,两列并按多个度量值聚合?

flseospp  于 2023-04-18  发布在  其他
关注(0)|答案(1)|浏览(113)

数据本身并不是真正的问题。
我有以下代码-

# convert timestamp to millisecond
relevant_data_pdf['milli'] = pd.to_datetime(relevant_data_pdf['timestamp']).astype(np.int64) / int(1e6)

# sort (day are values between 1 to 7)
relevant_data_pdf = relevant_data_pdf.sort_values(['id', 'inf_day', 'milli'])

# Calculate the diff between to consecutive rows
relevant_data_pdf['milli_diff'] = relevant_data_pdf.groupby(['id', 'inf_day'])['milli'].diff()

# aggregation by multiple metrices
relevant_data_pdf = relevant_data_pdf.groupby(['id', 'inf_day']).agg(avg=('milli_diff', np.mean),
                                                               median=('milli_diff', np.median),
                                                               max=('milli_diff', np.max),
                                                               min=('milli_diff', np.min))

我得到的结果是表格里的-

avg     median         max        min
id inf_day                                                
1  1        8.060000e+06  7200000.0  16500000.0   480000.0
   2        1.200333e+06  1771000.0   1800000.0    30000.0
   3        5.400000e+06  5400000.0   7200000.0  3600000.0
2  0        1.800000e+06  1800000.0   3600000.0        0.0
   2        0.000000e+00        0.0         0.0        0.0
   3        0.000000e+00        0.0         0.0        0.0

如何使结果是每个id的一行,这样对于每个id,我将拥有{inf_day}_{metric_name}形状中所有可能的列?

moiiocjp

moiiocjp1#

有几种方法可以做到这一点:

(df.unstack(-1)
   .swaplevel(0, 1, axis=1)
   .sort_index(axis=1)) # .sort_index(axis=1, level=0, sort_remaining=False) - preserve output in OP, thanks constantstranger!

或者

df.stack().unstack([-2, -1])

在任何一种情况下,结果都将如下所示

inf_day          0                                     1                                           2                                         3                                 
               avg        max     median  min        avg         max     median       min        avg        max     median      min        avg        max     median        min
id                                                                                                                                                                             
1              NaN        NaN        NaN  NaN  8060000.0  16500000.0  7200000.0  480000.0  1200333.0  1800000.0  1771000.0  30000.0  5400000.0  7200000.0  5400000.0  3600000.0
2        1800000.0  3600000.0  1800000.0  0.0        NaN         NaN        NaN       NaN        0.0        0.0        0.0      0.0        0.0        0.0        0.0        0.0

如果你想在这之后扁平化列名,我推荐以下方法:

unstacked = df.stack().unstack([-2, -1])
unstacked.columns = unstacked.columns.map('{0[0]}_{0[1]}'.format)

相关问题