如何找到一个pandas Dataframe 中与另一个 Dataframe 中的元素不同的所有元素?

nkoocmlb  于 2023-06-20  发布在  其他
关注(0)|答案(1)|浏览(108)

我有两个数据框架,主要由表示截止日期的pandas datetime对象列组成。每一行都是一个项目,每一列中有日期是该项目的截止日期。两个 Dataframe 的格式相同。下面是www.example.com()的输出,以使其更容易理解:df.info () to make this a little easier to understand:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59 entries, 0 to 58
Data columns (total 12 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   Status                 59 non-null     object        
 1   ID                     44 non-null     object        
 2   Current assignment     59 non-null     object        
 3   P0                     20 non-null     datetime64[ns]
 4   P1                     29 non-null     datetime64[ns]
 5   P2                     30 non-null     datetime64[ns]
 6   IDR                    24 non-null     datetime64[ns]
 7   FDR                    26 non-null     datetime64[ns]
 8   P3                     33 non-null     datetime64[ns]
 9   PPRR                   2 non-null      datetime64[ns]
 10  P4                     34 non-null     datetime64[ns]
 11  P5                     34 non-null     datetime64[ns]
dtypes: datetime64[ns](9), object(3)
memory usage: 5.7+ KB
None

确定两个 Dataframe 之间所有不同截止日期的索引的最佳方法是什么?

js4nwp54

js4nwp541#

你可以用Pandas。compare()

import pandas as pd

df1 = pd.DataFrame(data={'Project':[1,2,3], 'DL1':['2020-09-13', '2022-11-15', '2021-01-01'], 'DL2':['2021-01-01', "2023-01-01", '2021-01-01']})
df2 = pd.DataFrame(data={'Project':[1,2,3], 'DL1':['2020-09-13', '2022-01-15', '2020-01-01'], 'DL2':['2021-01-01', '2023-01-01', '2021-03-01']})

df1.compare(df2, keep_shape=True)


  Project               DL1                     DL2            
     self other        self       other        self       other
0     NaN   NaN         NaN         NaN         NaN         NaN
1     NaN   NaN  2022-11-15  2022-01-15         NaN         NaN
2     NaN   NaN  2021-01-01  2020-01-01  2021-01-01  2021-03-01

您还可以存储此结果并提取索引以用于原始索引

diffs = df1.compare(df2, keep_shape=True).dropna(how='all')
print(diffs)

  Project               DL1                     DL2            
     self other        self       other        self       other
1     NaN   NaN  2022-11-15  2022-01-15         NaN         NaN
2     NaN   NaN  2021-01-01  2020-01-01  2021-01-01  2021-03-01

diff_list = diffs.index
print(df1.loc[df1.index.isin(diff_list)])
   Project         DL1         DL2
1        2  2022-11-15  2023-01-01
2        3  2021-01-01  2021-01-01

如果你想比较一个列,比如你的P0和P1,它会变得更复杂,我不得不花更多的时间来处理它。

相关问题