pandas 合并两个DFS并合并NAN值的结果

pprl5pva 于 2023-05-12 发布在其他

关注(0)|答案(1)|浏览(121)

我有两个dfs，它们有相同的列，包含相同的信息，但来自不同的来源：

df_orders = pd.DataFrame({'id':[1,2,3],'model':['A1','A3','A6'], 'color':['Red','Blue','Green']})
df_billed = pd.DataFrame({'id':[1,6,7],'model':['A1','A7','B1'], 'color':['Purple','Pink','Red']})

然后我在df_billed by ids上做了一个左合并，并在列名重叠时添加后缀：

merge_df = pd.merge(df_billed,df_orders,on='id',how='left',suffixes=('_order','_billed'))

结果

id|model_order|color_order | model_billed | color_billed 
 0   1 |       A1  |    Purple  |         A1   |       Red
 1   6 |       A7  |      Pink  |       NaN    |       NaN
 2   7 |       B1  |       Red  |        NaN   |       NaN

当后缀是_order而不是billed时，列顺序具有更高的优先级，不知何故，我想有一个 Dataframe ，如果没有billed信息，那么我们接受顺序，并删除后缀：

id|model_billed | color_billed |
 0   1 |        A1   |      Red     |
 1   6 |        A7   |      Pink    |
 2   7 |        B1   |      Purple  |

理想情况下，我想先使用合并_first合并列，最后重命名它们，但代码看起来有点脏，需要寻找另一个设计更好的解决方案。

pandas

来源：https://stackoverflow.com/questions/68234319/merge-two-dfs-and-coalesce-the-results-for-nan-values

1条答案

按热度按时间

11dmarpk1#

您可以只使用.fillna()并使用_order列来填充NA

merge_df['model_billed'] = merge_df['model_billed'].fillna(merge_df['model_order'])
merge_df['color_billed'] = merge_df['color_billed'].fillna(merge_df['color_order'])

输出

merge_df[['id', 'model_billed', 'color_billed']]
    id  model_billed    color_billed
0   1   A1              Red
1   6   A7              Pink
2   7   B1              Red

更新

如果有更多这样的列，你可以像这样使用一个循环：

col_names = ['model', 'color']
for col in col_names:
    merge_df[col+'_billed'] = merge_df[col+'_billed'].fillna(merge_df[col+'_order'])

赞(0）回复(0）举报 2023-05-12

我来回答

pandas 合并两个DFS并合并NAN值的结果

1条答案

输出

更新

相关问题

热门标签

最新问答