pandas 根据另一个 Dataframe 中的最近值计算 Dataframe 中的列

rdrgkggo  于 2023-01-15  发布在  其他
关注(0)|答案(1)|浏览(180)

我有两个 Dataframe coarsefinefine包含3列start_time, end_time, start_price, end_pricecoarse包含start_time, end_time。请注意,所有时间都是Pandas时间戳对象,例如2016-12-12 01:03:13.15231+00:00
对于coarse,我需要添加2个额外的列start_price, end_price,其中coarse.start_pricefine.start_time最接近coarse.start_timefine.start_price,对于end_price也是如此。
下面是一个清晰的例子:

coarse:

start_time                              end_time
2016-12-12 01:00:00.000+00:00           2016-12-12 02:00:00.000+00:00
2016-12-12 02:00:00.000+00:00           2016-12-12 03:00:00.000+00:00
2016-12-12 03:00:00.000+00:00           2016-12-12 03:30:00.000+00:00
fine:

start_time                              end_time                         start_price 
2016-12-12 00:59:00.000+00:00           2016-12-12 01:12:00.000+00:00    2.3
2016-12-12 01:12:00.000+00:00           2016-12-12 01:15:00.000+00:00    4.5
2016-12-12 01:15:00.000+00:00           2016-12-12 01:45:00.000+00:00    5.7
2016-12-12 01:45:00.000+00:00           2016-12-12 01:55:00.000+00:00    8.8
2016-12-12 01:55:00.000+00:00           2016-12-12 02:15:00.000+00:00    9.9
2016-12-12 02:15:00.000+00:00           2016-12-12 02:16:00.000+00:00    11.2
2016-12-12 02:16:00.000+00:00           2016-12-12 02:31:00.000+00:00    13.5
2016-12-12 02:31:00.000+00:00           2016-12-12 02:45:00.000+00:00    14.8
2016-12-12 02:45:00.000+00:00           2016-12-12 02:59:00.000+00:00    15.9
2016-12-12 02:59:00.000+00:00           2016-12-12 03:31:00.000+00:00    16.0

所以结果应该是

coarse:
 
start_time                              end_time                         start_price
2016-12-12 01:00:00.000+00:00           2016-12-12 02:00:00.000+00:00    2.3
2016-12-12 02:00:00.000+00:00           2016-12-12 03:00:00.000+00:00    9.9
2016-12-12 03:00:00.000+00:00           2016-12-12 03:30:00.000+00:00    16.0

(我跳过了end_price,因为它是相同的逻辑)。我想知道什么是最简单的方法来做到这一点?
另外,我可以考虑根据时间来插值价格,而不是使用最近邻。有没有简单的方法也可以做到这一点?

yacmzcpb

yacmzcpb1#

您可能正在寻找pd.merge_asof,它非常适合在时间不完全匹配的情况下加入。有一个方向参数,您可以调整它以使匹配按照您想要的方向进行。

import pandas as pd

coarse = pd.DataFrame({'start_time': ['2016-12-12 01:00:00.000+00:00', '2016-12-12 02:00:00.000+00:00', '2016-12-12 03:00:00.000+00:00'], 'end_time': ['2016-12-12 02:00:00.000+00:00', '2016-12-12 03:00:00.000+00:00', '2016-12-12 03:30:00.000+00:00']} )

fine = pd.DataFrame({'start_time': ['2016-12-12 00:59:00.000+00:00', '2016-12-12 01:12:00.000+00:00', '2016-12-12 01:15:00.000+00:00', '2016-12-12 01:45:00.000+00:00', '2016-12-12 01:55:00.000+00:00', '2016-12-12 02:15:00.000+00:00', '2016-12-12 02:16:00.000+00:00', '2016-12-12 02:31:00.000+00:00', '2016-12-12 02:45:00.000+00:00', '2016-12-12 02:59:00.000+00:00'], 'end_time': ['2016-12-12 01:12:00.000+00:00', '2016-12-12 01:15:00.000+00:00', '2016-12-12 01:45:00.000+00:00', '2016-12-12 01:55:00.000+00:00', '2016-12-12 02:15:00.000+00:00', '2016-12-12 02:16:00.000+00:00', '2016-12-12 02:31:00.000+00:00', '2016-12-12 02:45:00.000+00:00', '2016-12-12 02:59:00.000+00:00', '2016-12-12 03:31:00.000+00:00'], 'start_price': [2.3, 4.5, 5.7, 8.8, 9.9, 11.2, 13.5, 14.8, 15.9, 16.0]})

coarse['start_time'] = pd.to_datetime(coarse['start_time'])
fine['start_time'] = pd.to_datetime(fine['start_time'])
coarse['end_time'] = pd.to_datetime(coarse['end_time'])
fine['end_time'] = pd.to_datetime(fine['end_time'])

coarse = pd.merge_asof(coarse, fine[['start_time','start_price']],
                       on='start_time',
                       direction='nearest')

coarse = pd.merge_asof(coarse,
                       fine[['end_time','start_price']].rename(columns={'start_price':'end_price'}),
                       on='end_time',
                       direction='nearest')

产出

start_time                  end_time  start_price  end_price
0 2016-12-12 01:00:00+00:00 2016-12-12 02:00:00+00:00          2.3        8.8
1 2016-12-12 02:00:00+00:00 2016-12-12 03:00:00+00:00          9.9       15.9
2 2016-12-12 03:00:00+00:00 2016-12-12 03:30:00+00:00         16.0       16.0

相关问题