根据条件计算与下一行的时间差,按PandasID分组

pn9klfpd  于 2023-02-11  发布在  其他
关注(0)|答案(2)|浏览(94)

与下一行、按ID分组、条件〉0的时间差需要在Pandas中计算
输入

ID          timestamp  Condition
aa  2023-1-5 06:33:27      23.33
aa  2023-1-5 06:33:33      13.26
aa  2023-1-5 06:33:39       5.71
aa  2023-1-5 06:33:45       0.00
aa  2023-1-5 06:33:51       0.00
aa  2023-1-5 06:33:57       0.00
aa  2023-1-5 06:46:15       0.00
aa  2023-1-5 06:46:21       0.00
aa  2023-1-5 06:46:27       2.18
aa  2023-1-5 06:46:33       0.00
aa  2023-1-5 06:46:39       4.10
aa  2023-1-5 06:46:45      21.73
aa  2023-1-5 06:46:51      33.79

输出:
时间戳_下一个|下一个时间差(秒)|
2023年1月5日6时33分33秒|六个|
2023年1月5日6时33分39秒|六个|
2023年1月5日6时46分27秒|七六八|

| |

| |

| |

||

||

2023年1月5日6时46分39秒|十二|

||

2023年1月5日6时46分45秒|六个|
2023年1月5日6时46分51秒|六个|
2023年1月5日6时46分57秒|六个|
样本代码

df2=df_input[(df_input['Condition']>0)]
       
df2['timestamp']= pd.to_datetime(df2['timestamp'])
df2['timestamp_next']=df2.groupby("id")["timestamp"].shift(-1)
df2['time_diff_next']=(df2['timestamp_next']-df2['timestamp'])/timedelta(seconds=1)

df_input=df_input.merge(df2[['id','timestamp','timestamp_next','time_diff_next']],how='left',on=['id','timestamp'])

我需要在不创建新 Dataframe df2的情况下实现此代码,如以上代码所示

dldeef67

dldeef671#

如果您必须计算每组的diffshift,则需要使用groupby以避免副作用。对于non-null Condition,在已筛选的DataFrame上使用groupby.diff似乎比较合适。
下面是在原始DataFrame中工作的一个建议:

m = df['Condition'].gt(0)
df['time_diff_next(seconds)'] = (df[m].groupby('ID')['timestamp']
                                 .diff().dt.total_seconds()
                                )

避免groupby的另一种方法可能是pivotmerge,但我预计性能会相当差:

df.merge(df[m].pivot(index='timestamp', columns='ID', values='timestamp'
                    ).diff().unstack().rename('time_diff_next(seconds)'),
         left_on=['ID', 'timestamp'], right_index=True, how='left')

输出:

ID           timestamp  Condition  time_diff_next(seconds)
0   aa 2023-01-05 06:33:27      23.33                      NaN
1   aa 2023-01-05 06:33:33      13.26                      6.0
2   aa 2023-01-05 06:33:39       5.71                      6.0
3   aa 2023-01-05 06:33:45       0.00                      NaN
4   aa 2023-01-05 06:33:51       0.00                      NaN
5   aa 2023-01-05 06:33:57       0.00                      NaN
6   aa 2023-01-05 06:46:15       0.00                      NaN
7   aa 2023-01-05 06:46:21       0.00                      NaN
8   aa 2023-01-05 06:46:27       2.18                    768.0
9   aa 2023-01-05 06:46:33       0.00                      NaN
10  aa 2023-01-05 06:46:39       4.10                     12.0
11  aa 2023-01-05 06:46:45      21.73                      6.0
12  aa 2023-01-05 06:46:51      33.79                      6.0
  • 注意:如果您想提高diff移位器,请使用diff(-1)并对输出求反:*
m = df['Condition'].gt(0)
df['time_diff_next(seconds)'] = (-df[m].groupby('ID')['timestamp']
                                 .diff(-1).dt.total_seconds()
                                 )

输出:

ID           timestamp  Condition  time_diff_next(seconds)
0   aa 2023-01-05 06:33:27      23.33                      6.0
1   aa 2023-01-05 06:33:33      13.26                      6.0
2   aa 2023-01-05 06:33:39       5.71                    768.0
3   aa 2023-01-05 06:33:45       0.00                      NaN
4   aa 2023-01-05 06:33:51       0.00                      NaN
5   aa 2023-01-05 06:33:57       0.00                      NaN
6   aa 2023-01-05 06:46:15       0.00                      NaN
7   aa 2023-01-05 06:46:21       0.00                      NaN
8   aa 2023-01-05 06:46:27       2.18                     12.0
9   aa 2023-01-05 06:46:33       0.00                      NaN
10  aa 2023-01-05 06:46:39       4.10                      6.0
11  aa 2023-01-05 06:46:45      21.73                      6.0
12  aa 2023-01-05 06:46:51      33.79                      NaN
xfyts7mz

xfyts7mz2#

您可以用途:

df['time_diff_next'] = (df.mask(df['Condition'].eq(0)).groupby('ID')['timestamp']
                          .transform(lambda x: x.diff().dt.total_seconds().shift(-1)))
print(df)

# Output
    ID           timestamp  Condition  time_diff_next
0   aa 2023-01-05 06:33:27      23.33             6.0
1   aa 2023-01-05 06:33:33      13.26             6.0
2   aa 2023-01-05 06:33:39       5.71           768.0
3   aa 2023-01-05 06:33:45       0.00             NaN
4   aa 2023-01-05 06:33:51       0.00             NaN
5   aa 2023-01-05 06:33:57       0.00             NaN
6   aa 2023-01-05 06:46:15       0.00             NaN
7   aa 2023-01-05 06:46:21       0.00             NaN
8   aa 2023-01-05 06:46:27       2.18            12.0
9   aa 2023-01-05 06:46:33       0.00             NaN
10  aa 2023-01-05 06:46:39       4.10             6.0
11  aa 2023-01-05 06:46:45      21.73             6.0
12  aa 2023-01-05 06:46:51      33.79             NaN

相关问题