pandas 如何在python中删除_duplicates

0md85ypi  于 2023-01-24  发布在  Python
关注(0)|答案(1)|浏览(123)

我必须与csv文件进行比较,我需要删除重复的行并生成另一个文件。

#here I´m comparing the csv files. The oldest_file and the newest_file
different_data_type = newest_file.equals(other = oldest_file)
#If they have differences, I concat them to drop those rows that are equals
merged_files = pd.concat([oldest_file, newest_file])
        
merged_files = merged_files.drop_duplicates()
print(merged_files())

每个csv文件大约有5. 000行,当我打印merged_files时,我收到了一个10. 000行的csv文件。换句话说,它不会掉。
如何只获取有差异的行?

cigdeys3

cigdeys31#

我认为您缺少drop_duplicates()中的列指示,请尝试使用like

df.drop_duplicates(subset=['column1', 'column2'])

另一种方法是在合并文件中查找重复项,然后从merged_files中删除它们:

duplicate_rows = merged_files.duplicated(subset=['column1', 'column2'])
merged_files = merged_files[~duplicate_rows]

相关问题