pandas 如何在python中删除_duplicates

0md85ypi 于 2023-01-24 发布在 Python

关注(0)|答案(1)|浏览(123)

我必须与csv文件进行比较，我需要删除重复的行并生成另一个文件。

#here I´m comparing the csv files. The oldest_file and the newest_file
different_data_type = newest_file.equals(other = oldest_file)

#If they have differences, I concat them to drop those rows that are equals
merged_files = pd.concat([oldest_file, newest_file])
        
merged_files = merged_files.drop_duplicates()
print(merged_files())

每个csv文件大约有5. 000行，当我打印merged_files时，我收到了一个10. 000行的csv文件。换句话说，它不会掉。
如何只获取有差异的行？

pandas

来源：https://stackoverflow.com/questions/75209782/how-to-drop-duplicates-in-python

1条答案

按热度按时间

cigdeys31#

我认为您缺少drop_duplicates()中的列指示，请尝试使用like

df.drop_duplicates(subset=['column1', 'column2'])

另一种方法是在合并文件中查找重复项，然后从merged_files中删除它们：

duplicate_rows = merged_files.duplicated(subset=['column1', 'column2'])
merged_files = merged_files[~duplicate_rows]

赞(0）回复(0）举报 2023-01-24

我来回答

pandas 如何在python中删除_duplicates

1条答案

相关问题

热门标签

最新问答