pandas 如何找到最优化的解决方案来迭代窗口时间戳、多列列表及其值？

olmpazwi 于 2023-02-28 发布在其他

关注(0)|答案(1)|浏览(78)

我正在开发一个更优化的解决方案（Pandas），以确定进入某个房间的人是否在1小时内离开该房间。"Enter"和"Exit"字段是进入/离开的人的姓名列表。如何在不使用多个for循环和itterows/ittertuple的情况下优化我的解决方案？
| 时间戳|房间ID|进入|退出|
| - ------|- ------|- ------|- ------|
| 2022年1月1日00时10分10秒|1个|汤姆玛丽简|南姓|
| 2022年1月1日00时10分12秒|第二章|南姓|哈利杰|
| 2022年1月1日00时10分19秒|三个|南姓|内森|
| 2022年1月1日00时11分26秒|第二章|巴里艾伦杰瑞|南姓|
| 2022年1月1日00时12分37秒|1个|南姓|杰克简|
结果 Dataframe 应包含1小时内进出人员的姓名。
| 时间戳|姓名|房间ID|
| - ------|- ------|- ------|
| 2022年1月1日00时10分10秒2022年1月1日00时12分37秒|简|1个|
这里有太多的for循环，我甚至不知道如何开始迭代。
谢谢你的帮助!我是Pandas新手，希望你能给我一些建议!

pandas

来源：https://stackoverflow.com/questions/75518390/how-to-find-the-most-optimised-solution-to-iterate-through-windowed-time-stamps

1条答案

按热度按时间

5jvtdoz21#

IIUC，在explode列之后使用merge_asof：

df['Time Stamp'] = pd.to_datetime(df['Time Stamp'])

df2 = (
 df.assign(Enter=df['Enter'].str.split(',\s*'),
           Exit=df['Exit'].str.split(',\s*'),
          )
   .explode('Enter').explode('Exit').replace('Nan', pd.NA)
)

out = pd.merge_asof(
  df2.dropna(subset='Enter')[['Time Stamp', 'RoomID', 'Enter']],
  df2.dropna(subset='Exit')[['Time Stamp', 'RoomID', 'Exit']]
     .rename(columns={'Time Stamp': 'Time Stamp Exit'}),
  left_on='Time Stamp', right_on='Time Stamp Exit',
  left_by=['RoomID', 'Enter'],
  right_by=['RoomID', 'Exit'],
  direction='forward', tolerance=pd.Timedelta('1h')
             ).dropna(subset='Exit')

print(out)

输出：

Time Stamp  RoomID Enter     Time Stamp Exit  Exit
2 2022-01-01 00:10:10       1  Jane 2022-01-01 00:12:37  Jane

赞(0）回复(0）举报 2023-02-28

我来回答

pandas 如何找到最优化的解决方案来迭代窗口时间戳、多列列表及其值？

1条答案

相关问题

热门标签

最新问答