Python-Pandas-根据等级变化获取最新记录

5kgi1eie  于 2023-01-28  发布在  Python
关注(0)|答案(1)|浏览(138)

要求是获取发生级别更改的最新记录。示例:

df = pd.DataFrame({
  'ID': [1,1,1,1,1,1,1,1,
         2,2,2,2,
         3,3,3,3],
  'Month': ['01-09-2020','01-10-2020','01-11-2020','01-12-2020','01-01-2021','01-02-2021','01-03-2021','01-04-2021',
            '01-01-2020','01-02-2020','01-03-2020','01-04-2020',
            '01-01-2021','01-02-2021','01-03-2021','01-04-2021'],
  'Level': [66,66,66,65,65,65,42,32,
            42,42,41,41,
            65,64,64,64],
  'Grade': [1,1,1,1,1,1,1,1,
            2,2,2,2,
            2,2,2,1],
  'Flag_Level': ['Y','Y','Y','Y','Y','Y','Y','Y',
                 'N','N','N','N',
                 'Y','Y','Y','Y']
})

df:输出:

ID  Month   Level   Grade   Flag_Level
0   1   01-09-2020  66  1        Y
1   1   01-10-2020  66  1        Y
2   1   01-11-2020  66  1        Y
3   1   01-12-2020  65  1        Y
4   1   01-01-2021  65  1        Y
5   1   01-02-2021  65  1        Y
6   1   01-03-2021  42  1        Y
7   1   01-04-2021  32  1        Y
8   2   01-01-2020  42  2        N
9   2   01-02-2020  42  2        N
10  2   01-03-2020  41  2        N
11  2   01-04-2020  41  2        N
12  3   01-01-2021  65  2        Y
13  3   01-02-2021  64  2        Y
14  3   01-03-2021  64  2        Y
15  3   01-04-2021  64  1        Y

** Dataframe 中各列的说明:文件编号:**

ID:每个ID每月将有多个记录(例如:1有6条记录(即6个月),2有4条记录..)级别:表示ID在每个月的级次,因此Grade Flag_Level:表示级别是否有任何变化,则表示为Y,否则表示为N。对于记录1、3,我们看到级别列有变化,因此为Y,而对于记录2,级别列没有变化。

已尝试代码:

df_filtered = df[
  ( df.Level < df.groupby('ID').Level.shift(1) )
]

此处64级大于65级,依此类推
执行上述代码后:
实际输出为:
df_已过滤:

ID  Month   Level   Grade   Flag_Level
3   1   01-12-2020  65  1         Y
6   1   01-03-2021  42  1         Y
7   1   01-04-2021  32  1         Y
10  2   01-03-2020  41  2         N
13  3   01-02-2021  64  2         Y

预期输出如下:

即,仅获取发生级别更改的最近月份记录。在这种情况下,记录1(上面有3个条目o/p),级别更改的最近更改记录应可用,即,仅月份的第7个记录:2021年4月1日-32层...就像下面。
预期输出:

ID  Month   Level   Grade   Flag_Level

7   1   01-04-2021  32  1          Y
10  2   01-03-2020  41  2          N
13  3   01-02-2021  64  2          Y
41zrol4v

41zrol4v1#

我不能100%确定最终输出应该是什么,甚至不能确定要应用什么过滤,但是这段代码过滤Level减少的行,然后每个ID取最后一行。

import pandas as pd

df = pd.DataFrame({
  'ID': [1,1,1,1,1,1,1,1,
         2,2,2,2,
         3,3,3,3],
  'Month': ['01-09-2020','01-10-2020','01-11-2020','01-12-2020','01-01-2021','01-02-2021','01-03-2021','01-04-2021',
            '01-01-2020','01-02-2020','01-03-2020','01-04-2020',
            '01-01-2021','01-02-2021','01-03-2021','01-04-2021'],
  'Level': [66,66,66,65,65,65,42,32,
            42,42,41,41,
            65,64,64,64],
  'Grade': [1,1,1,1,1,1,1,1,
            2,2,2,2,
            2,2,2,1],
  'Flag_Level': ['Y','Y','Y','Y','Y','Y','Y','Y',
                 'N','N','N','N',
                 'Y','Y','Y','Y']
})

df_filtered = df[
  ( df.Level < df.groupby('ID').Level.shift(1) )
].groupby('ID').last()

print(df_filtered)

https://trinket.io/python3/3126bf278f

***EDIT:***处理无序数据、数据类型、转换...

import pandas as pd

df = pd.DataFrame({
  'ID': [1,1,1,1,1,1,1,1,
         2,2,2,2,
         3,3,3,3],
  'Month': ['01-01-2021','01-02-2021','01-03-2021','01-04-2021','01-09-2020','01-11-2020','01-10-2020','01-12-2020',
            '01-01-2020','01-02-2020','01-03-2020','01-04-2020',
            '01-01-2021','01-02-2021','01-03-2021','01-04-2021'],
  'Level': [65,65,42,32,66,66,66,65,
            42,42,41,41,
            65,64,64,64],
  'Grade': [1,1,1,1,1,1,1,1,
            2,2,2,2,
            2,2,2,1],
  'Flag_Level': ['Y','Y','Y','Y','Y','Y','Y','Y',
                 'N','N','N','N',
                 'Y','Y','Y','Y']
})

print()
print('Non-Date Ordered...')
print('~~~~~~~~~~~~~~~~~~~')
print()
print(df)

print()
print(df.dtypes)

print()
print(
  df[ df.Level < df.groupby('ID').Level.shift(1) ].groupby('ID').last()
)

df['Month'] = pd.to_datetime(df['Month'], format='%d-%m-%Y')
df['Flag_Level'] = df['Flag_Level'].astype('category')

print()
print('Changed Datatypes')
print('~~~~~~~~~~~~~~~~~')
print()
print(df.dtypes)
print()
print(df)

df = df.sort_values(['ID', 'Month']).reset_index(drop=True) # dropping the index is optional

print()
print('Date Ordered...')
print('~~~~~~~~~~~~~~~')

print()
print(df)

print()
print(
  df[ df.Level < df.groupby('ID').Level.shift(1) ].groupby('ID').last()
)

https://trinket.io/python3/7c44cb99d9

相关问题