pandas 不能对多列使用panda.merge_asof

olqngx59  于 2023-01-19  发布在  其他
关注(0)|答案(3)|浏览(105)

我试图合并两个Pandas Dataframe ,每个包含两个字符串列和一个日期列。

df1
a    b      date
100  200    2022-01-03
100  200    2022-01-04
101  200    2022-01-05
101  200    2022-01-06
101  200    2022-01-07

df2
a    b      date
100  200    2022-01-04
100  200    2022-01-06
101  200    2022-01-03
101  200    2022-01-06
101  200    2022-01-09

目标是在a,B,日期合并它们,并取最近的日期(正向)。

df
a    b      date_x      date_y
100  200    2022-01-03  2022-01-04
100  200    2022-01-04  2022-01-04
101  200    2022-01-05  2022-01-06 (not 2022-01-03 because it is behind not forward)
101  200    2022-01-06  2022-01-06
101  200    2022-01-07  2022-01-09
igetnqfo

igetnqfo1#

我们可以合并ab,通过date_ydate_x之间的最小差异进行过滤,同时考虑正向

new_df = df1.merge(df2, on=['a', 'b'], how='inner')\
            .assign(diff_date=lambda df: df['date_y']
                        .sub(df['date_x'])
                        .where(lambda x: df['date_y'].ge(df['date_x'])), 
                    mask=lambda df: df['diff_date']
                        .eq(df.groupby(['a', 'b', 'date_x'])['diff_date']
                        .transform('min')))\
            .loc[lambda df: df['mask']]\
            .drop(['diff_date', 'mask'], axis=1)
print(new_df)

    a    b     date_x     date_y
0   100  200 2022-01-03 2022-01-04
2   100  200 2022-01-04 2022-01-04
5   101  200 2022-01-05 2022-01-06
8   101  200 2022-01-06 2022-01-06
12  101  200 2022-01-07 2022-01-09
whhtz7ly

whhtz7ly2#

你也可以试试

# merge on a,b and sort based on date
m = df1.merge(df2, on=['a', 'b'], how='left').sort_values(['date_x', 'date_y'])
# only keep dates that are <= df2 date
df = m[m['date_x'] <= m['date_y']]
# drop duplicates and filter
final_df = df.loc[df[['a', 'b', 'date_x']].drop_duplicates(keep='first').index]

      a    b     date_x     date_y
0   100  200 2022-01-03 2022-01-04
2   100  200 2022-01-04 2022-01-04
5   101  200 2022-01-05 2022-01-06
8   101  200 2022-01-06 2022-01-06
12  101  200 2022-01-07 2022-01-09
r3i60tvu

r3i60tvu3#

import pandas as pd

df1 = pd.DataFrame({'a': ['100', '100', '101', '101', '101'],
                    'b': ['200', '200', '200', '200', '200'],
                    'date': ['2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-07']})

df2 = pd.DataFrame({'a': ['100', '100', '101', '101', '101'],
                    'b': ['200', '200', '200', '200', '200'],
                    'date': ['2022-01-04', '2022-01-06', '2022-01-03', '2022-01-06', '2022-01-09']})
  
df3 = pd.merge(df1,df2,how='left',left_on=['a','b'],right_on=['b','a']).drop(['a_y','b_y'], axis=1) 
df3['date_y'] = df2['date']

相关问题