Python Pandas特定交替值之间的 Dataframe 时间差

mrwjdhj3  于 2023-10-14  发布在  Python
关注(0)|答案(3)|浏览(107)

我有一个应用程序使用情况的框架,分为4列,看起来像这样:

Id  Timestamp                App_Name   Event_Type
1   2018/01/16 06:01:05     Instagram   Opened
2   2018/01/16 06:01:06     Instagram   Closed
3   2018/01/16 06:01:07     Instagram   Opened
4   2018/01/16 06:01:08     Instagram   Interaction
5   2018/01/16 06:01:09     Instagram   Interaction
6   2018/01/16 06:02:08     Instagram   Closed
7   2018/01/16 06:01:08     Instagram   Opened
8   2018/01/16 06:01:08     Instagram   Opened
9   2018/01/16 06:01:09     Instagram   Opened
10  2018/01/16 06:01:09     Instagram   Closed
11  2018/01/16 06:03:44     Instagram   Opened
12  2018/01/16 06:03:44     Instagram   Closed
13  2018/01/16 06:03:45     Instagram   Closed
14  2018/01/16 06:03:45     Instagram   Closed
15  2018/01/16 06:03:47     Instagram   Opened

我想得到每对“Opened”后面跟着“Closed”行之间的时间差(以秒为单位),而不管它们之间是否有其他“Event_Types”。如果有多个连续的打开或关闭,可能会出现错误。我只想知道最后一次开盘和第一次收盘的区别。所以在这种情况下,我想要的时间差异:

  • 第二和第一
  • 6和3
  • 10和9
  • 12和11

我该怎么做呢?
谢谢你,谢谢

fgw7neuy

fgw7neuy1#

下面是另一种更复杂的方法,并排排列记录并减去时间戳列。

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

df = df.set_index('Id')
df['g'] = (df['Event_Type'] == 'Opened').cumsum()

df_open = df.query('Event_Type == "Opened"').groupby('g').head(1)
df_close = df.query('Event_Type == "Closed"').groupby('g').head(1)

df_result = df_open.merge(df_close, on='g', suffixes=('_Opened', '_Closed'))
df_result['Timedelta'] = df_result['Timestamp_Closed'] - df_result['Timestamp_Opened']

df_result

输出量:

Timestamp_Opened App_Name_Opened Event_Type_Opened  g    Timestamp_Closed App_Name_Closed Event_Type_Closed       Timedelta
0 2018-01-16 06:01:05       Instagram            Opened  1 2018-01-16 06:01:06       Instagram            Closed 0 days 00:00:01
1 2018-01-16 06:01:07       Instagram            Opened  2 2018-01-16 06:02:08       Instagram            Closed 0 days 00:01:01
2 2018-01-16 06:01:09       Instagram            Opened  5 2018-01-16 06:01:09       Instagram            Closed 0 days 00:00:00
3 2018-01-16 06:03:44       Instagram            Opened  6 2018-01-16 06:03:44       Instagram            Closed 0 days 00:00:00
r8xiu3jd

r8xiu3jd2#

试试看:

out, state = [], None
for i, e in zip(df["Id"], df["Event_Type"]):
    if e == "Opened":
        state = i
    elif e == "Closed" and state is not None:
        out.append([state, i])
        state = None

print(out)

图纸:

[[1, 2], [3, 6], [9, 10], [11, 12]]

要获取时差,请执行以下操作:

df["Timestamp"] = pd.to_datetime(df["Timestamp"])

out, state = [], None
for i, e in zip(df.index, df["Event_Type"]):
    if e == "Opened":
        state = i
    elif e == "Closed" and state is not None:
        out.append(df.loc[i, "Timestamp"] - df.loc[state, "Timestamp"])
        state = None

print(out)

图纸:

[Timedelta('0 days 00:00:01'), Timedelta('0 days 00:01:01'), Timedelta('0 days 00:00:00'), Timedelta('0 days 00:00:00')]
yzuktlbb

yzuktlbb3#

步骤1

将Timestamp列转换为日期时间d类型

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

第二步

创建一个以信号“Closed”结尾的组。将划分的组命名为grp

cond = df['Event_Type'].eq('Closed')
grp = cond.cumsum() - cond

步骤3

首先,删除df中带有Interaction的行。然后,计算grp中Timestamp列之间的diff(1)。最后,只保留Event_Type为Closed的列,并删除NaN。

(df[df['Event_Type'].ne('Interaction')]
  .groupby(grp)['Timestamp'].diff(1)[cond]
  .dropna())

产出:

相关问题