pandas 删除其他行值条件的行[重复]

zzlelutf 于 2023-04-18 发布在其他

关注(0)|答案(2)|浏览(130)

此问题已在此处有答案：

Drop duplicates keeping the row with the highest value in another column（1个答案）
4天前关闭。
| | 组ID|客户ID|刻痕|x1|x2|合同ID|y|
| --------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|
| 0|一百零一|1|九十五|F|三十|1|三十|
| 1|一百零一|1|九十五|F|三十|二|二十六|
| 二|一百零一|二|八十五|M|二十八|1|八|
| 三|一百零一|二|八十五|M|二十八|二|十八岁|
我想删除具有相同cust_id但y值较小的行。例如，对于cust_id=1，我想删除index =1的行。
我想用df.loc来选择具有相同cust_id的行，然后通过比较列y的条件来删除它们。但是我不知道如何做第一部分。

pandas

来源：https://stackoverflow.com/questions/76009144/delete-row-for-a-condition-of-other-row-values

2条答案

按热度按时间

eit6fx6z1#

使用sort_values按y排序，使用drop_duplicates仅保留每个cust_id的一次出现：

out = df.sort_values('y', ascending=False).drop_duplicates('cust_id')
print(out)

# Output
   group_id  cust_id  score x1  x2  contract_id   y
0       101        1     95  F  30            1  30
3       101        2     85  M  28            2  18

正如@ifly6所建议的，你可以使用groupby_idxmax：

out = df.loc[df.groupby('cust_id')['y'].idxmax()]
print(out)

# Output
   group_id  cust_id  score x1  x2  contract_id   y
0       101        1     95  F  30            1  30
3       101        2     85  M  28            2  18

赞(0）回复(0）举报 2023-04-18

1wnzp6jl2#

你可以使用drop_duplicates。下面是一个例子：

import pandas as pd
# Some data
df = pd.DataFrame({'cust_id': [1, 2, 1, 3, 4], 'y': [3, 4, 1, 5, 7]})
# Sorting by cust_id is actually not necessary
df.sort_values(by=['cust_id', 'y'], ascending=[True, True], inplace=True)
# Remove all the duplicates by cust_id, keeping the first one found
df.drop_duplicates(subset='cust_id', keep='last', inplace=True)

print(df)

    cust_id y
0   1   3
1   2   4
3   3   5
4   4   7

赞(0）回复(0）举报 2023-04-18

我来回答

pandas 删除其他行值条件的行[重复]

2条答案

相关问题

热门标签

最新问答