pandas 如何比较两个dataframe并返回一个新的dataframe，其中只有已更改的记录

q3qa4bjr 于 2023-04-04 发布在其他

关注(0)|答案(1)|浏览(150)

我想构建一个python脚本，它将比较两个pandas Dataframe ，并创建一个新的df，我可以使用它来更新我的SQL表。我通过阅读现有表来创建df1。我通过API调用读取新数据来创建df2。我想隔离更改的行并使用新值更新SQL表。
我试图通过外部合并进行比较，但我需要帮助返回 Dataframe ，其中只有任何字段中具有不同值的记录。
下面是我的示例df1：

下面是我的示例df2：

我想要的输出：

此函数返回整个dataframe，但未按预期工作：

def compare_dataframes(df1, df2, pk_col):
    # Merge the two dataframes on the primary key column
    df_merged = pd.merge(df1, df2, on=pk_col, how='outer', suffixes=('_old', '_new'))

    # Identify the rows that are different between the two dataframes
    df_diff = df_merged[df_merged.isna().any(axis=1)]

    # Drop the columns from the old dataframe and rename the columns from the new dataframe
    df_diff = df_diff.drop(columns=[col for col in df_diff.columns if col.endswith('_old')])
    df_diff = df_diff.rename(columns={col: col.replace('_new', '') for col in df_diff.columns})

    return df_diff

pandas

来源：https://stackoverflow.com/questions/75900997/how-to-compare-two-dataframes-and-return-a-new-dataframe-with-only-the-records-t

1条答案

按热度按时间

llmtgqce1#

一种方法可以是连接2个 Dataframe ，然后删除重复项，如下所示：

dict = {1:df1,2:df2}
df=pd.concat(dict)
df.drop_duplicates(keep=False)

如对类似问题的答复所述：https://stackoverflow.com/a/42649293/21442120

import sys 
from io import StringIO
import pandas as pd

DF1 = StringIO("""
id,field1,field2,field3,field4
0,x,y,,b
1,x,,,
2,x,y,z,
3,x,y,z,b
4,x,y,,b""")

DF2 = StringIO("""
id,field1,field2,field3,field4
0,x,y,,b
1,x,,a,
2,x,y,z,
3,x,y,z,b
4,x,y,a,b
""")

df1 = pd.read_table(DF1, sep=',', index_col='id')
df2 = pd.read_table(DF2, sep=',', index_col='id')

# STEP 1
dictionary = {1:df1,2:df2}
df=pd.concat(dictionary)
df3 = df.drop_duplicates(keep=False).reset_index()

# STEP 2
df4 = df3.drop_duplicates(subset=['id'], keep='last')
df4 = df4.drop('level_0', axis=1)
df4.head()

根据需要提供输出：

id  field1  field2  field3  field4
1   1   x   NaN a   NaN
2   4   x   y   a   b

赞(0）回复(0）举报 2023-04-04

我来回答

pandas 如何比较两个dataframe并返回一个新的dataframe，其中只有已更改的记录

1条答案

相关问题

热门标签

最新问答