pandas 使用另一个df作为查找表填充df中的列

js81xvg6  于 2023-01-19  发布在  其他
关注(0)|答案(1)|浏览(225)

我有2个 Dataframe ,df1和df2。

df1 (the lookup)    
group_id       date     value
0    105716  1/30/2019    Soccer
1    105717  1/30/2019  Football
2    105718  1/30/2019      Rest
3    105719  1/30/2019    Soccer
4    105716  1/30/2019      Rest
5    105717  1/31/2019      Rest
6    105718  02/01/2019  Football
7    105719  02/01/2019    Soccer

df2 (the parent)
    GROUP_ID  STARTDATE    ENDDATE
0     105716  1/30/2019  1/30/2019
1     105717  1/30/2019  1/30/2019
2     105718  1/30/2019  1/30/2019
3     105719  1/30/2019  1/30/2019
4     105716  1/30/2019  1/31/2019
5     105717  1/31/2019  1/31/2019
6     105718  1/31/2019  1/31/2019
7     105719  1/31/2019  1/31/2019
8     105716  1/31/2019  1/31/2019
9     105717  1/31/2019  1/31/2019
10    105718  1/31/2019  1/31/2019
11    105719  1/31/2019   2/1/2019
12    105716   2/1/2019   2/1/2019
13    105717   2/1/2019   2/1/2019
14    105718   2/1/2019   2/1/2019
15    105719   2/1/2019   2/1/2019
16    105716   2/1/2019   2/1/2019
17    105717   2/1/2019   2/1/2019
18    105718   2/1/2019   2/1/2019
19    105719   2/1/2019   2/1/2019
20    105716   2/1/2019   2/2/2019
21    105717   2/2/2019   2/2/2019
22    105718   2/2/2019   2/2/2019
23    105719   2/2/2019   2/2/2019
24    105716   2/2/2019   2/2/2019
25    105717   2/2/2019   2/2/2019
26    105718   2/2/2019   2/2/2019
27    105719   2/2/2019   2/3/2019
28    105716   2/3/2019   2/3/2019
29    105722   2/3/2019   2/3/2019

我尝试向df2添加一个“结果”字段,并使用df1中的值填充它,其中GROUP_ID = group_id,并且date介于STARTDATE和ENDDATE之间,同时保留df2中的所有行。任何Nan\Null值都将被设置为“None”。我可以使用循环来完成此操作,但需要一段时间才能完成我尝试的所有操作。where()

df2['RESULT'] = 'None'
df2.result = np.where(((df1.group_id==df2.GROUP_ID)&((df1.date>=df2.STARTDATE)&(df1.date>=df2.ENDDATE))), df1.value, 'None')

和矢量化方法

df2.result = df1.value[(df1.group_id==df2.GROUP_ID)&((df1.date>=df2.STARTDATE)&(df1.date>=df2.ENDDATE))]

和合并方法

df_activity = pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')[((pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')['STARTDATE'] <= pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id').date)&(pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')['ENDDATE'] >= pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')['date']))]

我尝试的前两个给予我一个错误

ValueError: Can only compare identically-labeled DataFrame objects

合并工作,但它也删除了所有不匹配的行。我想我可以用另一个合并来解决这个问题,但如果有一个更快更精简的方式,我想使用它。

kx1ctssn

kx1ctssn1#

merge首先在group_id上,然后在query上按日期筛选您的列表:

out = (df2.merge(df1, on='group_id')
          .query("(startdate <= date) & (date <= enddate)"))
print(out)

# Output:
   group_id  startdate    enddate       date  value
0         1 2023-01-15 2023-01-20 2023-01-17    100

输入 Dataframe :

>>> df1
   group_id       date  value
0         1 2023-01-17    100
1         1 2023-01-30    200

>>> df2
   group_id  startdate    enddate
0         1 2023-01-15 2023-01-20

相关问题