使用 Dataframe 中的值在另一个 Dataframe 中查找

tnkciper  于 2021-08-25  发布在  Java
关注(0)|答案(1)|浏览(320)

我试图使用df列(df1)中的一个值作为在另一个df(df2)中查找的索引。
我使用apply和lambda函数得出了一个解决方案:

max_edad = int(df2.iloc[:,0].max() - 1) #The value will be 116
df1['Vivos(t)'] = df1['fecha_ord'].apply(lambda x: df2.loc[int(x), 'lx_1970'] * (1 - (x % 1)) + df2.loc[int(x) + 1,'lx_1970'] * (x % 1) if x < max_edad else 0)

然而,我在一个巨大的数据库中运行它,它非常慢(尽管它可以工作)。
你知道我如何以不同的方式运行它以加快速度吗?
以下是我的 Dataframe 的一些示例:
df1
tfechafactor_descfecha_ord02016-04-011.00000045.32512012016-05-010.99633945.40725522016-06-010.99269145.49212932016-07-010.98905645.57426442016-08-010.98543545.65913852016-09-010.98182745.7440161612016-10-010.97823245.82614672016-11-010.97465045.91102082016-12-010.97108245.993155901-010.7580906-162017.1629017。96045446.239562122017-04-010.95693846.324435132017-05-010.95343446.406571142017-06-010.94994346.491444............13902132-02-010.057234161.15811113912132-03-010.057163161.23750913922132-04-010.057093161.322382
df2
Edadlx19700.01.000000 1.09.9099482.09.9012973.09.8967764.09.8928295.09.8895426.09.886405……41.09.57799142.09.56510343.09.55153644.09.53751545.09.52274946.09.507039……116.00
我预计结果如下:
df3
tfechafactor戡u descfecha戡u ordvivos(t)02016-04-011.00000045.3251209.51764212016-05-010.99633945.4072559.51635122016-06-010.99269145.4921299.51501832016-07-010.98905645.5742649.513728412016-08-010.98543545.6591389.512394512016-09-010.981825.745.7440119.51106162016-10-010.97825.8270469.5097315-1025.1089712.045-971259.50714792017-01-010.96752646.0780299.505715102017-02-010.96398446.1629029.504274112017-03-010.96045446.2395629.502972122017-04-010.95693846.3244359.501532132017-05-010.95343446.4065719.500137142017-06-010.94994346.4914449.498696...............13902132-02-010.057234161.1581110.013912132-03-010.057163161.2375090.013922132-04-010.057093161.3223820.0
非常感谢你!

flvtvl50

flvtvl501#

我认为最好将计算分为几个步骤:

df1['fecha_ord_int'] = df1['fecha_ord'].astype(int)
df1['fecha_ord_dec'] = df1['fecha_ord'] % 1
df2['lx_1970_next'] = df2['lx_1970'].shift(-1)

df1 = df1.merge(df2, how='inner', left_on='fecha_ord_int', right_on='edad')

# now do the calculation you want

# you can drop the columns you don't want later

希望这能有所帮助

相关问题