pandas 合并两个DFS并合并NAN值的结果

pprl5pva  于 2023-05-12  发布在  其他
关注(0)|答案(1)|浏览(121)

我有两个dfs,它们有相同的列,包含相同的信息,但来自不同的来源:

df_orders = pd.DataFrame({'id':[1,2,3],'model':['A1','A3','A6'], 'color':['Red','Blue','Green']})
df_billed = pd.DataFrame({'id':[1,6,7],'model':['A1','A7','B1'], 'color':['Purple','Pink','Red']})

然后我在df_billed by ids上做了一个左合并,并在列名重叠时添加后缀:

merge_df = pd.merge(df_billed,df_orders,on='id',how='left',suffixes=('_order','_billed'))

结果

id|model_order|color_order | model_billed | color_billed 
 0   1 |       A1  |    Purple  |         A1   |       Red
 1   6 |       A7  |      Pink  |       NaN    |       NaN
 2   7 |       B1  |       Red  |        NaN   |       NaN

当后缀是_order而不是billed时,列顺序具有更高的优先级,不知何故,我想有一个 Dataframe ,如果没有billed信息,那么我们接受顺序,并删除后缀:

id|model_billed | color_billed |
 0   1 |        A1   |      Red     |
 1   6 |        A7   |      Pink    |
 2   7 |        B1   |      Purple  |

理想情况下,我想先使用合并_first合并列,最后重命名它们,但代码看起来有点脏,需要寻找另一个设计更好的解决方案。

11dmarpk

11dmarpk1#

您可以只使用.fillna()并使用_order列来填充NA

merge_df['model_billed'] = merge_df['model_billed'].fillna(merge_df['model_order'])
merge_df['color_billed'] = merge_df['color_billed'].fillna(merge_df['color_order'])

输出

merge_df[['id', 'model_billed', 'color_billed']]
    id  model_billed    color_billed
0   1   A1              Red
1   6   A7              Pink
2   7   B1              Red

更新

如果有更多这样的列,你可以像这样使用一个循环:

col_names = ['model', 'color']
for col in col_names:
    merge_df[col+'_billed'] = merge_df[col+'_billed'].fillna(merge_df[col+'_order'])

相关问题