Pandas：删除重复数据时，将 Dataframe 与DateTime Index连接

5m1hhzi4 于 2023-06-20 发布在其他

关注(0)|答案(2)|浏览(118)

我有2个 Dataframe （有些值是重复的，例如2020-02-13）：

>>> print(df1)
                   Val
Date                
2020-02-20         152.50
2020-02-19         152.53
2020-02-18         152.20
2020-02-13         152.28

>>> print(fd2)
                   Val
Date                
2018-02-20         141.40
2018-02-21         141.37
2018-02-22         141.17
2018-02-26         141.35
2018-02-27         140.69
...                   ...
2020-02-05         152.37
2020-02-06         152.20
2020-02-10         152.03
2020-02-11         151.19
2020-02-13         152.28
[298 rows x 1 columns]

两者都由Date（df1.set_index（'Date'））索引，并且两个 Dataframe 日期都被解析（pd.to_datetime（df1.index））。现在，我想将它们合并并删除重复项（如果有的话）。我试过了

>>> pd.concat([df1, df2])
                   Val
Date                
2018-02-20         141.40
2018-02-21         141.37
2018-02-22         141.17
2018-02-26         141.35
2018-02-27         140.69
...                   ...
2020-02-13         152.28
2020-02-20         152.50
2020-02-19         152.53
2020-02-18         152.20
2020-02-13         152.28
[302 rows x 1 columns]

我得到了新的df与重复（2020-02-13）.但是在跑步的时候

>>>pd.concat([df1, df2]).drop_duplicates()
                   Val
Date                
2018-02-20         141.40
2018-02-21         141.37
2018-02-22         141.17
2018-02-26         141.35
2018-02-27         140.69
...                   ...
2020-02-06         152.20
2020-02-10         152.03
2020-02-11         151.19
2020-02-13         152.28
2020-02-20         152.50
[299 rows x 1 columns]

它删除了副本，但也删除了一些值（2020-02-18和2020-02-19）。知道为什么吗什么是正确的为什么要连接2个按日期索引的 Dataframe ？

pandas

来源：https://stackoverflow.com/questions/60393159/pandas-remove-duplicates-deletes-data-when-concatenate-data-frames-with-datetim

2条答案

按热度按时间

k4aesqcs1#

样品：

print (df1)
               Val
Date              
2020-02-20  152.50
2020-02-19  152.53
2020-02-18  152.20
2020-02-13  152.28

print (df2)
               Val
Date              
2018-02-20  152.53
2018-02-21  141.37
2020-02-13  152.28

如果连接在一起：

print (pd.concat([df1, df2]))
               Val
Date              
2020-02-20  152.50
2020-02-19  152.53
2020-02-18  152.20
2020-02-13  152.28
2018-02-20  152.53
2018-02-21  141.37
2020-02-13  152.28

您的解决方案只删除所有列的重复项，这里Val列，索引未测试：

df3 = pd.concat([df1, df2]).drop_duplicates()
print (df3)
               Val
Date              
2020-02-20  152.50
2020-02-19  152.53 <-dupe
2020-02-18  152.20
2020-02-13  152.28 <-dupe
2018-02-21  141.37

如果将DatetimeIndex转换为column，它将删除所有列的重复项，这里是Date和column Val：

df4 =  pd.concat([df1, df2]).reset_index().drop_duplicates()
print (df4)
        Date     Val
0 2020-02-20  152.50
1 2020-02-19  152.53 <-not dupe, different datetime
2 2020-02-18  152.20
3 2020-02-13  152.28 <-dupe
4 2018-02-20  152.53 <-not dupe, different datetime
5 2018-02-21  141.37

如果需要，仅使用DatetimeIndex删除重复项

df5 = pd.concat([df1, df2])
df5 = df5[~df5.index.duplicated()]
print (df5)
Date              
2020-02-20  152.50
2020-02-19  152.53
2020-02-18  152.20
2020-02-13  152.28 <-dupe
2018-02-20  152.53
2018-02-21  141.37

或者通过subset参数中指定的列Date删除重复项：

df51 = pd.concat([df1, df2]).reset_index().drop_duplicates(subset=['Date'])
print (df51)
        Date     Val
0 2020-02-20  152.50
1 2020-02-19  152.53
2 2020-02-18  152.20
3 2020-02-13  152.28 <-dupe
4 2018-02-20  152.53
5 2018-02-21  141.37

赞(0）回复(0）举报 2023-06-20

xdyibdwo2#

pandas的concat method的verify_integrity选项是否有效？在你的例子中，它看起来像这样：

df = pd.concat([df1, df2], verify_integrity=True)

赞(0）回复(0）举报 2023-06-20

我来回答

Pandas：删除重复数据时，将 Dataframe 与DateTime Index连接

2条答案

相关问题

热门标签

最新问答