numpy 如何修改带条件的行?在Python中

7nbnzgx9  于 2023-03-30  发布在  Python
关注(0)|答案(2)|浏览(168)

我有一个员工历史数据集,包含工作,经理等信息。我想看看的是,如果一个经理在他们缺席时接管了另一个。如果发生这种情况,请在当前经理的文件中添加一个**(Sub)**在他们的名字旁边。
这是我的输出:

Emp_ID    Job_Title      Manager_Pos    Manager Name     MGR_ID 
   1        Sales            627         John Doe           12
   1        Sales            627         John Doe           12
   1        Sales            627         David Stern        4
   2        Tech             324         Mark Smith         7
   2        Tech             324         Henry Ford         13
   2        Tech             324         Henry Ford         13

这是我想要的输出:

Emp_ID    Job_Title     Manager_pos     Manager Name      Mgr_ID
  1        Sales            627           John Doe          12
  1        Sales            627           John Doe          12
  1        Sales            627           David Stern(Sub)  4  
  2        Tech             324           Mark Smith        7 
  2        Tech             324           Henry Ford(Sub)   13 
  2        Tech             324           Henry Ford(Sub)   13

我曾尝试使用:

`np.where((df['Manager_pos].head(1) == df['Manager_pos') & (df['Manager Name'].head(1) != df['Manager Name'].tail(1)), df['Manager Name'] + 'Sub', df['Manager Name')

这段代码最后抛出了一个错误。有什么建议吗?

4c8rllxm

4c8rllxm1#

假设您希望在组内的第一个管理器之后,每当管理器发生更改时都追加'(sub)',请使用groupby.transform来标识第一个名称,然后使用布尔索引:

m = (df.groupby(['Emp_ID', 'Manager_pos']) # for each group
     ['Manager Name'].transform('first')   # get first name
     .ne(df['Manager Name'])               # check if current row is different
    )

df.loc[m, 'Manager Name'] += '(sub)'

输出:

Emp_ID Job_Title  Manager_pos      Manager Name  Mgr_ID
0       1     Sales          627          John Doe      12
1       1     Sales          627          John Doe      12
2       1     Sales          627  David Stern(sub)       4
3       2      Tech          324        Mark Smith       7
4       2      Tech          324   Henry Ford(sub)      13
5       2      Tech          324   Henry Ford(sub)      13
yqyhoc1h

yqyhoc1h2#

使用布尔掩码。如果秩大于1,则将'(Sub)'附加到Manager Name列:

cols = ['Emp_ID', 'Manager Pos']
m = df.groupby(cols)['Manager Name'].rank(method='dense', ascending=False).gt(1)

df.loc[m, 'Manager Name'] += ' (Sub)'

输出:

>>> df
   Emp_ID Job_Title  Manager_pos       Manager Name  Mgr_ID
0       1     Sales          627           John Doe      12
1       1     Sales          627           John Doe      12
2       1     Sales          627  David Stern (Sub)       4
3       2      Tech          324         Mark Smith       7
4       2      Tech          324   Henry Ford (Sub)      13
5       2      Tech          324   Henry Ford (Sub)      13

相关问题