我的数据如下所示:
A B
MEDUI5945189 GATE IN
MEDUI5945189 RAIL LOAD
MEDUI5945189 GATE OUT
EBKG04830245 LOADED ON VESSEL
EBKG04830245 GATE OUT
COAU7242812270 VESSEL DEPARTURE
COAU7242812270 GATE IN
COAU7242812270 CHANGE IN SHIPMENT ETA
COAU7242812270 GATE OUT FULL
EBKG04830245 CHANGE IN SHIPMENT ETA
EBKG04830245 RAIL UNLOAD
EBKG04830245 VESSEL DEPARTURE
我想按列A分组,如果该组中至少有一行在列B中包含单词'RAIL',则返回整个组。预期结果是:
A B
MEDUI5945189 GATE IN
MEDUI5945189 RAIL LOAD
MEDUI5945189 GATE OUT
EBKG04830245 LOADED ON VESSEL
EBKG04830245 GATE OUT
EBKG04830245 CHANGE IN SHIPMENT ETA
EBKG04830245 RAIL UNLOAD
EBKG04830245 VESSEL DEPARTURE
我知道我需要df_sel = df.groupby('A')['B']
,但在设置条件时很挣扎。
1条答案
按热度按时间relj7zay1#
您可以通过
Series.str.contains
过滤所有A
组(如果存在RAIL
),并在boolean indexing
中过滤Series.isin
中的原始列A
:或者使用
GroupBy.transform
和GroupBy.any
作为掩码: