pandas 如何在panda中保留不同日期的重复值

yzuktlbb 于 2023-03-11 发布在其他

关注(0)|答案(2)|浏览(139)

我有一个数据框显示用户和登录日期，我正在努力寻找一个解决方案，使用python/panda只留下用户登录每一个独特的一天。
我需要删除用户每天多次登录的行，只为每个用户每天保留一行，然后删除没有在每个特定日期登录的任何用户。
Dataframe
| 日期|用户|
| - ------|- ------|
| 2023年1月1日|汤姆|
| 2023年1月1日|汤姆|
| 2023年3月1日|安德鲁|
| 2023年2月3日|安德鲁|
| 2023年3月1日|巴里|
| 2023年1月2日|安德鲁|
预期结果
| 日期|用户|
| - ------|- ------|
| 2023年3月1日|安德鲁|
| 2023年2月3日|安德鲁|
| 2023年1月2日|安德鲁|

pandas

来源：https://stackoverflow.com/questions/75665093/how-do-i-keep-duplicate-values-across-different-dates-in-pandas

2条答案

按热度按时间

xjreopfe1#

注解代码

# frequency table of user login count per date
s = pd.crosstab(df['User'], df['Date'])

# remove the date columns where user has
# logged in more than one time in a given day
s = s.loc[:, ~(s > 1).any()]

# select the rows where user has logged in for all dates
s = s.loc[s.eq(1).all(1), :]

# filter/select the rows where User satisfy's the above condition
df.query('User in @s.index')

结果

Date    User
2  01/03/23  Andrew
3  02/03/23  Andrew
5  01/02/23  Andrew

赞(0）回复(0）举报 2023-03-11

snvhrwxg2#

以下是使用drop_duplicates()和nunique()的选项

#remove all duplicates if both Date and User are duplicated
df2 = df.drop_duplicates(keep=False)

#groupby user and find the unique count to compare to total unique count
df2.loc[df2.groupby('User')['Date'].transform('nunique').eq(df2['Date'].nunique())]

输出：

Date    User
2  01/03/23  Andrew
3  02/03/23  Andrew
5  01/02/23  Andrew

赞(0）回复(0）举报 2023-03-11

我来回答

pandas 如何在panda中保留不同日期的重复值

2条答案

注解代码

结果

相关问题

热门标签

最新问答