python-3.x Pandas群首末行时间差

cngwdvgl  于 2023-04-13  发布在  Python
关注(0)|答案(3)|浏览(99)

输入

Date                  event   
2023-04-11 13:42:16   play   
2023-04-11 14:02:26   play 
2023-04-11 14:36:09   play 
2023-04-11 14:37:46   start
2023-04-11 14:41:34   start
2023-04-11 14:46:27   start
2023-04-11 14:47:03   start

在pandas dataframe中预期此。按事件顺序分组,按日期和第一次和最后一次时间差分组。
输出

date          event   diff
2023-04-11    play    00:53:52 
2023-04-11    start   00:09:17
5f0d552i

5f0d552i1#

使用自定义groupby.agg

df['Date'] = pd.to_datetime(df['Date'])

out = (
 df.groupby([df['Date'].dt.normalize(), 'event'])
   .agg(diff=('Date', lambda g: g.iloc[-1]-g.iloc[0]))
   .reset_index()
)

或者重复使用石斑鱼:

g = df.groupby([df['Date'].dt.normalize(), 'event'])['Date']

out = g.last().sub(g.first()).reset_index(name='diff')

输出:

Date  event            diff
0 2023-04-11   play 0 days 00:53:53
1 2023-04-11  start 0 days 00:09:17
非排序输入

如果数据最初未排序,则使用min/max作为聚合:

out = (
 df.groupby([df['Date'].dt.normalize(), 'event'])
   .agg(diff=('Date', lambda g: g.max()-g.min()))
   .reset_index()
)

或者:

g = df.groupby([df['Date'].dt.normalize(), 'event'])['Date']
out = g.max().sub(g.min()).reset_index(name='diff')

这在语义上与使用np.ptp相同,但由于某种原因(不幸的是),效率要高得多。

# max - min 
99.7 ms ± 2.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# np.ptp
355 ms ± 43.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
v1uwarro

v1uwarro2#

让我们做groupbyaggnp.ptp,即(最大-最小)

df['Date'] = pd.to_datetime(df['Date'])
df.groupby(['event', df['Date'].dt.date])['Date'].agg(np.ptp).reset_index(name='diff')

结果

event        Date            diff
0   play  2023-04-11 0 days 00:53:53
1  start  2023-04-11 0 days 00:09:17
ttcibm8c

ttcibm8c3#

df1.assign(date=df1.Date.dt.date).groupby('event',as_index=False)\
    .agg(date=("date",'first'),diff=("Date",lambda ss:str(ss.iat[-1]-ss.iat[0])[6:]))

结果

event        Date            diff
0   play  2023-04-11 0 days 00:53:53
1  start  2023-04-11 0 days 00:09:17

相关问题