我有2个 Dataframe ,df1和df2。
df1 (the lookup)
group_id date value
0 105716 1/30/2019 Soccer
1 105717 1/30/2019 Football
2 105718 1/30/2019 Rest
3 105719 1/30/2019 Soccer
4 105716 1/30/2019 Rest
5 105717 1/31/2019 Rest
6 105718 02/01/2019 Football
7 105719 02/01/2019 Soccer
df2 (the parent)
GROUP_ID STARTDATE ENDDATE
0 105716 1/30/2019 1/30/2019
1 105717 1/30/2019 1/30/2019
2 105718 1/30/2019 1/30/2019
3 105719 1/30/2019 1/30/2019
4 105716 1/30/2019 1/31/2019
5 105717 1/31/2019 1/31/2019
6 105718 1/31/2019 1/31/2019
7 105719 1/31/2019 1/31/2019
8 105716 1/31/2019 1/31/2019
9 105717 1/31/2019 1/31/2019
10 105718 1/31/2019 1/31/2019
11 105719 1/31/2019 2/1/2019
12 105716 2/1/2019 2/1/2019
13 105717 2/1/2019 2/1/2019
14 105718 2/1/2019 2/1/2019
15 105719 2/1/2019 2/1/2019
16 105716 2/1/2019 2/1/2019
17 105717 2/1/2019 2/1/2019
18 105718 2/1/2019 2/1/2019
19 105719 2/1/2019 2/1/2019
20 105716 2/1/2019 2/2/2019
21 105717 2/2/2019 2/2/2019
22 105718 2/2/2019 2/2/2019
23 105719 2/2/2019 2/2/2019
24 105716 2/2/2019 2/2/2019
25 105717 2/2/2019 2/2/2019
26 105718 2/2/2019 2/2/2019
27 105719 2/2/2019 2/3/2019
28 105716 2/3/2019 2/3/2019
29 105722 2/3/2019 2/3/2019
我尝试向df2添加一个“结果”字段,并使用df1中的值填充它,其中GROUP_ID = group_id,并且date介于STARTDATE和ENDDATE之间,同时保留df2中的所有行。任何Nan\Null值都将被设置为“None”。我可以使用循环来完成此操作,但需要一段时间才能完成我尝试的所有操作。where()
df2['RESULT'] = 'None'
df2.result = np.where(((df1.group_id==df2.GROUP_ID)&((df1.date>=df2.STARTDATE)&(df1.date>=df2.ENDDATE))), df1.value, 'None')
和矢量化方法
df2.result = df1.value[(df1.group_id==df2.GROUP_ID)&((df1.date>=df2.STARTDATE)&(df1.date>=df2.ENDDATE))]
和合并方法
df_activity = pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')[((pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')['STARTDATE'] <= pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id').date)&(pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')['ENDDATE'] >= pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')['date']))]
我尝试的前两个给予我一个错误
ValueError: Can only compare identically-labeled DataFrame objects
合并工作,但它也删除了所有不匹配的行。我想我可以用另一个合并来解决这个问题,但如果有一个更快更精简的方式,我想使用它。
1条答案
按热度按时间kx1ctssn1#
merge
首先在group_id
上,然后在query
上按日期筛选您的列表:输入 Dataframe :