pandas 我正在获取nan值,同时从另一个 Dataframe 向 Dataframe 填充数据

vm0i2vca  于 2023-02-14  发布在  其他
关注(0)|答案(1)|浏览(168)

i具有 Dataframe df1,其中i具有零值...df1另一个 Dataframe df1,其在时间基础df2 groupby上由df1分组。当我试图用来自df1的值填充df1的零值时,它给出我的NaN...final dataframe
我正在使用附加代码...

for x in df2['time']:
    
   df1.loc[(df1['i1'] == 0) & (df1['time']== x),'i1'] = df2[df2['time']==x]['i1']
nqwrtyyt

nqwrtyyt1#

这个

df1.loc[(df1['i1'] == 0) & (df1['time']== x),'i1'] = df2[df2['time']==x]['i1']

返回NaNs,因为df 2和df 1之间的索引不对齐。
建议询问技术问题-不要提供屏幕截图,而是提供构建df 1和df 2的代码。对于试图帮助您重现问题的人来说,这要容易得多
我在这里所说的是我最大的努力来回答你

In [2]: df1 = pd.DataFrame({
   ...:     "time": [
   ...:         pd.Timestamp("2018-01-01 00:00:00"),
   ...:         pd.Timestamp("2018-01-01 00:00:00"),
   ...:         pd.Timestamp("2018-01-01 00:00:00"),
   ...:         pd.Timestamp("2010-01-01 00:00:10"),
   ...:         pd.Timestamp("2010-01-01 00:00:10"),
   ...:         pd.Timestamp("2010-01-01 00:00:10"),
   ...:     ],
   ...:     "indicator": [
   ...:         0, 1, 2, 0, 1, 2    ]
   ...: })

In [3]: df1
Out[3]: 
                 time  indicator
0 2018-01-01 00:00:00          0
1 2018-01-01 00:00:00          1
2 2018-01-01 00:00:00          2
3 2010-01-01 00:00:10          0
4 2010-01-01 00:00:10          1
5 2010-01-01 00:00:10          2

In [4]: df2 = df1.groupby("time").mean().reset_index()

In [5]: df2
Out[5]: 
                 time  indicator
0 2010-01-01 00:00:10        1.0
1 2018-01-01 00:00:00        1.0

In [6]: out = df1.merge(df2, on="time", suffixes=("_df1", "_df2")) # we merge to align the indices

In [7]: out
Out[7]: 
                 time  indicator_df1  indicator_df2
0 2018-01-01 00:00:00              0            1.0
1 2018-01-01 00:00:00              1            1.0
2 2018-01-01 00:00:00              2            1.0
3 2010-01-01 00:00:10              0            1.0
4 2010-01-01 00:00:10              1            1.0
5 2010-01-01 00:00:10              2            1.0

In [8]: out["indicator"] = out["indicator_df1"]

In [9]: mask = out["indicator_df1"] == 0

In [10]: out.loc[mask, "indicator"] = out.loc[mask, "indicator_df2"]

In [11]: out
Out[11]: 
                 time  indicator_df1  indicator_df2  indicator
0 2018-01-01 00:00:00              0            1.0          1
1 2018-01-01 00:00:00              1            1.0          1
2 2018-01-01 00:00:00              2            1.0          2
3 2010-01-01 00:00:10              0            1.0          1
4 2010-01-01 00:00:10              1            1.0          1
5 2010-01-01 00:00:10              2            1.0          2

上面的代码所做的就是把源数据和你想要估算的数据合并,然后用布尔掩码进行修正,这样就可以给予正确的答案,而且比运行for循环要快得多。
请注意,这可以通过依赖groupby.transform来进一步简化,以避免创建两个 Dataframe 并合并...

相关问题