Pandas Groupby:仅保留列中具有相同值的行

mv1qrgav  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(123)

我有一个列表,其中包含一个大型债券universum的数百个公司投资,根据投资的优先级,每个债券都被分配了一个推荐。假设我有以下的结构:

df = pd.DataFrame({'BOND': {0: 'A', 1: 'A', 
                        2: 'B', 3: 'B', 
                        4: 'C', 5: 'C', }, 
               'STATUS': {0: 'SENIOR', 1: 'SUBORDINATED',
                          2: 'SENIOR', 3: 'SUBORDINATED',
                          4: 'SENIOR', 5: 'SUBORDINATED',
                          },
               'STRATEGY': {0: 'BUY', 1: 'SELL',
                            2: 'SELL', 3: 'SELL',
                            4: 'BUY', 5: 'SELL',
                           }})


    BOND    STATUS       STRATEGY
0   A       SENIOR       BUY
1   A       SUBORDINATED SELL
2   B       SENIOR       SELL
3   B       SUBORDINATED SELL
4   C       SENIOR       BUY
5   C       SUBORDINATED SELL

字符串
我如何能够只选择那些在“状态”列中的所有可能类型的“策略”列中有卖出建议的债券?在这个例子中,债券B满足这些标准。本质上,我想要的输出应该如下所示:

BOND    STATUS       STRATEGY
2   B       SENIOR       SELL
3   B       SUBORDINATED SELL


很确定这可以通过Panda的内置功能“groupby”来安排。任何帮助都非常感谢!

flvlnr44

flvlnr441#

对于组中所有值的绝对匹配,可以使用groupby.transform('all')

out = df[df['STRATEGY'].eq('SELL').groupby(df['BOND']).transform('all')]

字符串
groupby.all

s = df['STRATEGY'].eq('SELL').groupby(df['BOND']).all()

out = df[df['BOND'].isin(s[s].index)]


输出量:

BOND        STATUS STRATEGY
2    B        SENIOR     SELL
3    B  SUBORDINATED     SELL


如果出于某种原因,您可以使用不同的策略多次获得相同的STATUS,您可以使用set比较(此处为每组):

s1 = df[df['STRATEGY'].eq('SELL')].groupby('BOND')['STATUS'].agg(set)
s2 = df.groupby('BOND')['STATUS'].agg(set)

s = s1.eq(s2)
out = df[df['BOND'].isin(s[s].index)]


或者将每个组与列中的整体可能策略进行比较:

s1 = df[df['STRATEGY'].eq('SELL')].groupby('BOND')['STATUS'].agg(set)
s2 = set(df['STATUS'])

s = s1.eq(s2)
out = df[df['BOND'].isin(s[s].index)]


备选输入:

BOND        STATUS STRATEGY
0    A        SENIOR      BUY
1    A  SUBORDINATED     SELL
2    B        SENIOR     SELL
3    B  SUBORDINATED     SELL
4    C        SENIOR      BUY
5    C        SENIOR     SELL
6    C  SUBORDINATED     SELL
7    D  SUBORDINATED     SELL


输出(每组):

BOND        STATUS STRATEGY
2    B        SENIOR     SELL
3    B  SUBORDINATED     SELL
4    C        SENIOR      BUY # C was selected as there is at
5    C        SENIOR     SELL # least one SELL per STATUS
6    C  SUBORDINATED     SELL
7    D  SUBORDINATED     SELL # D is included as there is no SENIOR in the group


产出(总体):
这里D被省略,因为没有匹配的'SENIOR'

BOND        STATUS STRATEGY
2    B        SENIOR     SELL
3    B  SUBORDINATED     SELL
4    C        SENIOR      BUY
5    C        SENIOR     SELL
6    C  SUBORDINATED     SELL

55ooxyrt

55ooxyrt2#

如果需要仅使用SELL策略测试所有BOND,用途:

out = df[~df['BOND'].isin(df.loc[df['STRATEGY'].ne('SELL'), 'BOND'])]
print (out)
  BOND        STATUS STRATEGY
2    B        SENIOR     SELL
3    B  SUBORDINATED     SELL

字符串
如果还需要测试BOND是否具有所有可能的STATUS值,请添加:

uniq = set(df['STATUS'])

out = out[out.groupby('BOND')['STATUS'].transform('nunique').eq(len(uniq))]
print (out)
  BOND        STATUS STRATEGY
2    B        SENIOR     SELL
3    B  SUBORDINATED     SELL


不同的输出与更改的数据:

print (df)
  BOND        STATUS STRATEGY
0    A        SENIOR      BUY
1    A  SUBORDINATED     SELL
2    B        SENIOR     SELL
3    B  SUBORDINATED     SELL
4    C        SENIOR      BUY
5    C        SENIOR     SELL
6    C  SUBORDINATED     SELL
7    D        SENIOR     SELL
8    D        SENIOR     SELL

out = df[~df['BOND'].isin(df.loc[df['STRATEGY'].ne('SELL'), 'BOND'])]
print (out)
  BOND        STATUS STRATEGY
2    B        SENIOR     SELL
3    B  SUBORDINATED     SELL
7    D        SENIOR     SELL
8    D        SENIOR     SELL
uniq = set(df['STATUS'])

out = out[out.groupby('BOND')['STATUS'].transform('nunique').eq(len(uniq))]
print (out)
  BOND        STATUS STRATEGY
2    B        SENIOR     SELL
3    B  SUBORDINATED     SELL

相关问题