pandas 根据排序值的条件删除数据框中的行

yzxexxkh  于 2023-02-07  发布在  其他
关注(0)|答案(2)|浏览(114)

如果我有下面的 Dataframe

raw_data = {
'code': [1,1,1,1,2,2,2,2],
'Date': ['2022-01-04','2022-01-01', '2022-01-03','2022-01-02', '2022-01-08', '2022-01-07','2022-01-06','2022-01-05'],
'flag_check': [np.NaN, np.NaN, '11-33-24-33333' ,np.NaN, np.NaN,'11-55-24-33443' ,np.NaN, np.NaN],
'rank':[np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN]
}

df = pd.DataFrame(raw_data, columns=['code', 'Date','flag_check', 'rank'])

我需要执行以下操作
1-先按代码再按日期排列条目
2-在相同的代码条目内,基于代码和日期用序列号1、2、3填充等级列。
3-检查“flag_check”的值,如果不为空,则删除其后的所有行
预期成果

abithluo

abithluo1#

下面是一个方法:

df['rank'] = df.groupby(['code'])['Date'].rank(method='dense').astype(int)
df = df.sort_values(['code','Date'])
x = df.groupby('code')['flag_check'].apply(lambda x:x.shift().notna().cumsum())
df = df.loc[x[x==0].index,:].reset_index(drop=True)

输入:

code        Date      flag_check  rank
0     1  2022-01-04             NaN   NaN
1     1  2022-01-01             NaN   NaN
2     1  2022-01-03  11-33-24-33333   NaN
3     1  2022-01-02             NaN   NaN
4     2  2022-01-08             NaN   NaN
5     2  2022-01-07  11-55-24-33443   NaN
6     2  2022-01-06             NaN   NaN
7     2  2022-01-05             NaN   NaN

输出:

code        Date      flag_check  rank
0     1  2022-01-01             NaN     1
1     1  2022-01-02             NaN     2
2     1  2022-01-03  11-33-24-33333     3
3     2  2022-01-05             NaN     1
4     2  2022-01-06             NaN     2
5     2  2022-01-07  11-55-24-33443     3
uqzxnwby

uqzxnwby2#

注解代码
# Order by Date
s = df.sort_values('Date')

# rank the date column per code group
s['rank'] = s.groupby('code')['Date'].rank(method='dense')

# create boolean mask to identify the rows after the first non-null value
mask = s['flag_check'].notna()[::-1].groupby(df['code']).cummax()
结果
s[mask]

   code        Date      flag_check  rank
1     1  2022-01-01             NaN   1.0
3     1  2022-01-02             NaN   2.0
2     1  2022-01-03  11-33-24-33333   3.0
7     2  2022-01-05             NaN   1.0
6     2  2022-01-06             NaN   2.0
5     2  2022-01-07  11-55-24-33443   3.0

相关问题