Pandas:如何按某个列进行分组并选择包含某些值的组?

vuktfyat  于 2023-03-28  发布在  其他
关注(0)|答案(1)|浏览(117)

我的数据如下所示:

A                B
MEDUI5945189     GATE IN
MEDUI5945189     RAIL LOAD
MEDUI5945189     GATE OUT
EBKG04830245     LOADED ON VESSEL
EBKG04830245     GATE OUT
COAU7242812270   VESSEL DEPARTURE
COAU7242812270   GATE IN
COAU7242812270   CHANGE IN SHIPMENT ETA
COAU7242812270   GATE OUT FULL
EBKG04830245     CHANGE IN SHIPMENT ETA
EBKG04830245     RAIL UNLOAD
EBKG04830245     VESSEL DEPARTURE

我想按列A分组,如果该组中至少有一行在列B中包含单词'RAIL',则返回整个组。预期结果是:

A                B
MEDUI5945189     GATE IN
MEDUI5945189     RAIL LOAD
MEDUI5945189     GATE OUT
EBKG04830245     LOADED ON VESSEL
EBKG04830245     GATE OUT
EBKG04830245     CHANGE IN SHIPMENT ETA
EBKG04830245     RAIL UNLOAD
EBKG04830245     VESSEL DEPARTURE

我知道我需要df_sel = df.groupby('A')['B'],但在设置条件时很挣扎。

relj7zay

relj7zay1#

您可以通过Series.str.contains过滤所有A组(如果存在RAIL),并在boolean indexing中过滤Series.isin中的原始列A

df1 = df[df.A.isin(df.loc[df['B'].str.contains('RAIL'), 'A'])]

或者使用GroupBy.transformGroupBy.any作为掩码:

df1 = df[df['B'].str.contains('RAIL').groupby(df.A).transform('any')]
print (df1)
               A                       B
0   MEDUI5945189                 GATE IN
1   MEDUI5945189               RAIL LOAD
2   MEDUI5945189                GATE OUT
3   EBKG04830245        LOADED ON VESSEL
4   EBKG04830245                GATE OUT
9   EBKG04830245  CHANGE IN SHIPMENT ETA
10  EBKG04830245             RAIL UNLOAD
11  EBKG04830245        VESSEL DEPARTURE

相关问题