pandas Python -包含另一个表中的时间数据

swvgeqrz  于 2022-12-21  发布在  Python
关注(0)|答案(1)|浏览(106)

我有这两个 Dataframe :

import pandas as pd
from pandas import Timestamp
df_1 = pd.DataFrame({'id': {0: 'A',
  1: 'A',
  2: 'B',
  3: 'C',
  4: 'C'},
 'IdOrder': {0: 1, 1: 2, 2: 1, 3: 1, 4: 2},
 'TrackDateTime': {0: Timestamp('2020-01-21 23:28:35'),
  1: Timestamp('2020-01-28 17:12:15'),
  2: Timestamp('2020-01-07 12:41:48'),
  3: Timestamp('2020-01-01 22:13:44'),
  4: Timestamp('2020-01-01 22:49:53')}})
df_1

df_2 = pd.DataFrame({'id': {0: 'A',
  1: 'B',
  2: 'C',
  3: 'D',
  4: 'E'},
 'InitialDate': {0: Timestamp('2020-01-21 23:28:35'),
  1: Timestamp('2020-01-07 12:41:48'),
  2: Timestamp('2020-01-01 22:13:44'),
  3: Timestamp('2020-01-02 15:45:10'),
  4: Timestamp('2020-01-02 22:21:36')},
 'EndDate': {0: Timestamp('2020-01-28 00:00:00'),
  1: Timestamp('2020-01-08 00:00:00'),
  2: Timestamp('2020-01-03 00:00:00'),
  3: Timestamp('2020-01-06 00:00:00'),
  4: Timestamp('2020-04-10 00:00:00')}})
df_2

我希望用df_2中的InitialDateEndDatedf_1中的每个id括起来,这将给予以下预期输出:

还请考虑:

  • IdOrder=0用于InitialDateIdOrder=-1用于EndDate
  • 我想保持时间粒度达到秒(输出图像不显示,因为excel格式)

我还没能找到解决这个问题的办法。有什么建议吗?:)

kr98yfug

kr98yfug1#

您可以使用meltconcat和自定义排序:

tmp = (
    df_2[df_2['id'].isin(df_1['id'])]
 .rename(columns={'InitialDate':0, 'EndDate': -1})
 .melt('id', var_name='IdOrder', value_name='TrackDateTime')
)

out = (pd.concat([tmp, df_1])
         .sort_values(by='IdOrder', key=lambda s: s.replace(-1, np.inf))
         .sort_values(by='id')
       )

print(out)

输出:

id IdOrder       TrackDateTime
0  A       0 2020-01-21 23:28:35
0  A       1 2020-01-21 23:28:35
1  A       2 2020-01-28 17:12:15
3  A      -1 2020-01-28 00:00:00
1  B       0 2020-01-07 12:41:48
2  B       1 2020-01-07 12:41:48
4  B      -1 2020-01-08 00:00:00
2  C       0 2020-01-01 22:13:44
3  C       1 2020-01-01 22:13:44
4  C       2 2020-01-01 22:49:53
5  C      -1 2020-01-03 00:00:00

相关问题