pandas 如何在python中选择某个值以上的行并按不同的列分组?

ou6hu8tu  于 2023-03-16  发布在  Python
关注(0)|答案(3)|浏览(126)

我有以下数据框:

Column A    Column B
MD223       GATE IN
MD223       GATE OUT
MD223       LOADED
MD223       DEPARTURE
SC511       GATE IN
SC511       LOADED
SC511       SHIPPED
KR977       DISPATCHED
KR977       LOADED
KR977       SHIPPED

我尝试做的是,对于A列中具有相同值的每一组,在B列中找到值为'LOADED'的第一行,然后返回值'LOADED'之前的行。

Column A    Column B
MD223       GATE IN
MD223       GATE OUT
MD223       LOADED
SC511       GATE IN
SC511       LOADED
KR977       DISPATCHED
KR977       LOADED

我会很感激你的帮助。

更新

初始 Dataframe 在列B中有重复,但结果必须相同。

Column A    Column B
MD223       GATE IN
MD223       GATE OUT
MD223       LOADED
MD223       DEPARTURE
MD223       LOADED
SC511       GATE IN
SC511       LOADED
SC511       SHIPPED
SC511       LOADED
KR977       DISPATCHED
KR977       LOADED
KR977       SHIPPED

预期成果:

Column A    Column B
MD223       GATE IN
MD223       GATE OUT
MD223       LOADED
SC511       GATE IN
SC511       LOADED
KR977       DISPATCHED
KR977       LOADED
gpfsuwkq

gpfsuwkq1#

让我们对A列中的每个组执行cummax,以标记第一个LOADED之前的所有行,然后使用loc选择所有标记的行

mask = df['Column B'] == 'LOADED' 
df.loc[mask[::-1].groupby(df['Column A']).cummax()]

更新:如果Column B中有重复项

mask = df['Column B'].eq('LOADED') & ~df.duplicated(subset=['Column A', 'Column B'])
df.loc[mask[::-1].groupby(df['Column A']).cummax()]

结果

Column A    Column B
0    MD223     GATE IN
1    MD223    GATE OUT
2    MD223      LOADED
4    SC511     GATE IN
5    SC511      LOADED
7    KR977  DISPATCHED
8    KR977      LOADED
gfttwv5a

gfttwv5a2#

Column A分组,并将每组切片到由LOADED值表示的第一个有效索引:

df.groupby('Column A', sort=False)\
    .apply(lambda x: x.loc[:x['Column B'].eq('LOADED').idxmax()])\
    .reset_index(drop=True)
Column A    Column B
0    MD223     GATE IN
1    MD223    GATE OUT
2    MD223      LOADED
3    SC511     GATE IN
4    SC511      LOADED
5    KR977  DISPATCHED
6    KR977      LOADED
sf6xfgos

sf6xfgos3#

另一种可能的解决方案,也适用于存在重复项的情况:

df.loc[(df.groupby('Column A', sort=False)['Column B']
        .transform(lambda g: ~g.shift().eq('LOADED').cummax())),:]

输出:

Column A    Column B
0     MD223     GATE IN
1     MD223    GATE OUT
2     MD223      LOADED
5     SC511     GATE IN
6     SC511      LOADED
9     KR977  DISPATCHED
10    KR977      LOADED

相关问题