pandas 根据 多 个 列 创建 条件 标志

ocebsuys  于 2022-11-20  发布在  其他
关注(0)|答案(2)|浏览(182)

现有数据框架:

Id       Month        Year       scheduled         completed
A         Jan         2021           0                 0
A         Feb         2021           1                 0
A         mar         2021           0                 0
B         June        2021           0                 1
B         July        2021           0                 1
B         Aug         2021           0                 1
B         Sep         2021           0                 1
C         Nov         2021           1                 0
C         Dec         2021           1                 0
C         Jan         2022           1                 0
C         Feb         2022           1                 0

预期的 Dataframe :

Id           status
 A          defaulter
 B          non_defaulter
 C          defaulter

我正在尝试为他们的每项活动创建一个状态标签。如果连续三个月completed列保持0,则该ID将被标记为“defaulter”或“non_defaulter”

xwbd5t1u

xwbd5t1u1#

Idea是聚合每个连续组0的值,每个Id和助手Series sSeries.cumsum创建,计数True的s值,然后由于可能的多个组Id聚合max和最后一组numpy.where中的值:

N = 3
m = df['completed'].ne(0)
#if need check in completed OR scheduled column
#m = df[['completed','scheduled']].ne(0).any(axis=1)

df = ((~m).groupby([df['Id'], m.cumsum().mask(m, -1)])
          .sum()
          .groupby(level=0)
          .max()
          .reset_index(name='status')
          .assign(status = lambda x: np.where(x['status'].ge(N), 
                                            'defaulter','non_defaulter')))
       
print (df)
  Id         status
0  A      defaulter
1  B  non_defaulter
2  C      defaulter
bttbmeg0

bttbmeg02#

您可以使用双groupby来计算completed中连续的1,然后确保至少有一个大于或等于N=3的延伸:

N = 3

# is the row a zero?
m = df['completed'].eq(0)

# count the consecutive zeros
(m.groupby([df['Id'], (~m).cumsum()])
   .sum().ge(N)
   # check if there is at least one stretch of value >= N
   .groupby(level=0).any()
   # convert the True/False into strings
   .map({False: 'non_defaulter', True: 'defaulter'})
   .reset_index(name='status')
)

输出量:

Id         status
0  A      defaulter
1  B  non_defaulter
2  C      defaulter

相关问题