SQL LAG()的Pandas等效函数

jjhzyzn0  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(128)

我有这个dataframe

df = pd.DataFrame(
{'id': [10, 10, 10, 12, 12, 12, 12, 13, 13, 13],
 'session_id': [1, 3, 9, 1, 3, 5, 7, 1, 3, 5],
 'start_time': [5866, 6810, 8689, 8802, 8910, 9013, 9055, 9157, 9654, 9665],
 'end_time': [6808, 8653, 8722, 8881, 9001, 9049, 9062, 9651, 9659, 9725]
})

df.head()
    id  session_id  start_time  end_time
0   10       1        5866      6808
1   10       3        6810      8653
2   10       9        8689      8722
3   12       1        8802      8881
4   12       3        8910      9001

我需要一个新的stay_time列,用来存储用户在当前会话之后、新会话开始之前的停留时间。
要求:

id  session_id  start_time  end_time    stay_time
0   10      1         5866       6808           0
1   10      3         6810       8653           2
2   10      9         8689       8722          36
3   12      1         8802       8881           0
4   12      3         8910       9001          29
5   12      5         9013       9049          12
6   12      7         9055       9062           6
7   13      1         9157       9651           0
8   13      3         9654       9659           3
9   13      5         9665       9725           6

SQL中,这相当于:

# assuming participants is the table 
select p.*,
    start_time - lag(end_time, 1, start_time) over(partition by id order by session_id) stay_time
from participants p
xe55xuns

xe55xuns1#

可以按以下方式使用groupbyshift

df['stay_time'] = df.groupby('id')['start_time'].shift(-1) - df['end_time']
df['stay_time'] = df['stay_time'].fillna(0).astype(int)
df.loc[df['id'] != df['id'].shift(-1), 'stay_time'] = 0

它给出了

id  session_id  start_time  end_time  stay_time
0  10           1        5866      6808          2
1  10           3        6810      8653         36
2  10           9        8689      8722          0
3  12           1        8802      8881         29
4  12           3        8910      9001         12
5  12           5        9013      9049          6
6  12           7        9055      9062          0
7  13           1        9157      9651          3
8  13           3        9654      9659          6
9  13           5        9665      9725          0
4xrmg8kj

4xrmg8kj2#

Series.subDataFrameGroupBy.shiftSeries.fillna一起使用:

df['stay_time'] = (df['start_time'].sub(df.groupby('id')['end_time'].shift())
                                   .fillna(0, downcast='int'))
print (df)
   id  session_id  start_time  end_time  stay_time
0  10           1        5866      6808          0
1  10           3        6810      8653          2
2  10           9        8689      8722         36
3  12           1        8802      8881          0
4  12           3        8910      9001         29
5  12           5        9013      9049         12
6  12           7        9055      9062          6
7  13           1        9157      9651          0
8  13           3        9654      9659          3
9  13           5        9665      9725          6

相关问题