我有一个PandasDataFrame如下:
df = pd.DataFrame({'id' : [1,1,2,2,3,3,4,4,5,6,6,7,7,8,8,9,9],
'value' : ["GC", "GD", "GD", "GQ","GQ","GR","LA","LK","LK",
"HA","HE","HE","JB","JB","JF","JF","JJ"]})
我想按id分组,比较group的最后一个值和group的第一个值,生成一个新列,如下所示。
id value status
1 GC na
1 GD different
2 GD same
2 GQ different
3 GQ same
3 GR different
4 LA different
4 LK different
5 LK same
6 HA different
6 HE different
7 HE same
7 JB different
8 JB same
8 JF different
9 JF same
9 JJ na
我试过下面的代码,它似乎比较了同一组中的第一个值和最后一个值
def check_status(group):
selected = [False] * len(group)
selected[0] = selected[-1] = True
new_group = group[selected]
new_group['status'] = 'different' if new_group.value.is_unique else 'same'
return new_group
last_first.groupby('id').apply(check_status).reset_index(drop=True)
我很感激任何形式的帮助谢谢。
2条答案
按热度按时间nimxete21#
试试这个:
3vpjnl9f2#
我试过加强循环检查。不是优化,但它是另一种方法。