使用GROUP BY PANDA以长格式选择条件适用于多行的行

holgip5t  于 2022-09-21  发布在  其他
关注(0)|答案(2)|浏览(127)

我有一个如下所示的pandas Dataframe :

import pandas as pd
foo = pd.DataFrame({'id_p': [1,1,2,2,3,3,3,4,4], 
              'id_d_b': [True, True, False, True, True, True,False,False,False], 
              'id_d_i': [False, False, True, False,False,False,True,True,True]})
foo

id_p    id_d_b  id_d_i
0   1   True    False
1   1   True    False
2   2   False   True
3   2   True    False
4   3   True    False
5   3   True    False
6   3   False   True
7   4   False   True
8   4   False   True

我想选择在id_d_b``and至少有一个Trueid_d_i中有一个Trueid_p

我试过这个

foo['id_d_b'] = foo['id_d_b'].astype(int)
foo['id_d_i'] = foo['id_d_i'].astype(int)

foo['has_id_d_b'] = foo.groupby('id_p')['id_d_b'].transform('max')
foo['has_id_d_i'] = foo.groupby('id_p')['id_d_i'].transform('max')

foo['result'] = foo['has_id_d_b'] + foo['has_id_d_i'] # if this is >1 then that specific id_p has at least one id_d_b and id_d_i
foo['result'] = foo.eval('result > 1')
foo

id_p    id_d_b  id_d_i  has_id_d_b  has_id_d_i  result
0   1   1   0   1   0   False
1   1   1   0   1   0   False
2   2   0   1   1   1   True
3   2   1   0   1   1   True
4   3   1   0   1   1   True
5   3   1   0   1   1   True
6   3   0   1   1   1   True
7   4   0   1   0   1   False
8   4   0   1   0   1   False

这给出了正确的结果,但我正在寻找一个“一行”解决方案

py49o6xq

py49o6xq1#

您可以使用:groupby.any检查每个组中是否有任何值为True(最终针对列的子集),然后确保所有列都是带有allTrue。使用isin,您可以获得相关的索引(如果需要,可用于切片或索引)。

m = foo.groupby('id_p').any().all(1)

# if other columns

# m = foo.groupby('id_p')[['id_d_b', 'id_d_i']].any().all(1)

foo['result'] = foo['id_p'].isin(m[m].index)

对于指数:

print(m[m].index)

# Int64Index([2, 3], dtype='int64', name='id_p')

或者,使用更昂贵的groupby.transform

foo['result'] = foo.groupby('id_p').transform('any').all(1)

# if other columns

# foo['result'] = this foo.groupby('id_p')[['id_d_b', 'id_d_i']].transform('any').all(1)

输出:

id_p  id_d_b  id_d_i  result
0     1    True   False   False
1     1    True   False   False
2     2   False    True    True
3     2    True   False    True
4     3    True   False    True
5     3    True   False    True
6     3   False    True    True
7     4   False    True   False
8     4   False    True   False
ia2d9nvy

ia2d9nvy2#

使用由聚合GroupBy.anyDataFrame.all创建的Series.mapSeries

foo['result'] = foo['id_p'].map(foo.groupby('id_p')[['id_d_b', 'id_d_i']].any().all(axis=1))
print (foo)
   id_p  id_d_b  id_d_i  result
0     1    True   False   False
1     1    True   False   False
2     2   False    True    True
3     2    True   False    True
4     3    True   False    True
5     3    True   False    True
6     3   False    True    True
7     4   False    True   False
8     4   False    True   False

相关问题