pandas 在Python中查找最后一个值和倒数第二个值

baubqpgj  于 2023-04-28  发布在  Python
关注(0)|答案(3)|浏览(240)

我有一个数据集,希望看到最后一个和倒数第二个人持有相同的立场。
样本数据

ID    Date        Name      Job        Job_ID  
  101   01/2022     Adam      Sales       1234
  101   02/2022     Adam      Sales       1234 
  101   03/2022     Adam      Sales       1234
  102   01/2022     Blake     Tech        2345
  102   02/2022     Blake     Tech        2345
  102   02/2022     Blake     Consultant  3456
  103   01/2022     Chris     Advisor     4567
  103   02/2022     Chris     Analyst     5678
  103   03/2022     Chris     Analyst     5678
  104   01/2022     Debbie    Consultant  3456
  104   02/2022     Debbie    Consultant  3456
  104   03/2022     Debbie    Tech        2345
  105   01/2022     Eric      Analyst     5678
  105   02/2022     Eric      Advisor     4567
  105   03/2022     Eric      Advisor     4567

预期输出:

ID     Name     Job          JobID        Lastest   Prior  
101    Adam     Sales        1234         Adam      Adam 
102    Blake    Consultant   3456         Blake     Debbie
103    Chris    Analyst      5678         Chris     Eric 
104    Debbie   Tech         2345         Debbie    Blake 
105    Eric     Advisor      4567         Eric      Chris

我有代码来获取最后一个:

manager_dict = df.groupby('ID').agg({'Name':'last'}).to_dict()['Name']
 df['Latest'] = df['ID'].apply(lambda x: manager_dict[x])

我只是不知道如何让倒数第二个人担任这个职位。

s71maibg

s71maibg1#

您可以用途:

out = (
 df.sort_values(by='Date', key=pd.to_datetime)
   .assign(Latest=lambda d: d['Name'], # optional
           Prior=lambda d: d.groupby('Job_ID')['Name']
                            .transform(lambda x: (u:=x.unique())[len(u)-2]))
   .loc[lambda d: ~d.duplicated(subset='Job_ID', keep='last')]
   .sort_index()
)

输出:

ID     Date    Name         Job  Job_ID  Latest   Prior
2   101  03/2022    Adam       Sales    1234    Adam    Adam
8   103  03/2022   Chris     Analyst    5678   Chris    Eric
10  104  02/2022  Debbie  Consultant    3456   Blake  Debbie
11  104  03/2022  Debbie        Tech    2345  Debbie   Blake
14  105  03/2022    Eric     Advisor    4567    Eric   Chris
ldxq2e6h

ldxq2e6h2#

这里有另一种方法,首先按Job_ID分组,然后聚合Name,另外两种方法生成LatestPrior,然后将其连接回原始groupby:

def prior_unique(g):
    u = g.unique()
    return u[-2] if len(u) > 1 else u[-1]

gg = df.sort_values('Date', key=pd.to_datetime).drop('Date', axis=1).groupby('Job_ID')
last = gg['Name'].agg(Latest='last', Prior=prior_unique)
out = gg.agg('last').join(last).reset_index().sort_values('ID')

输出:

Job_ID   ID    Name         Job  Latest   Prior
0    1234  101    Adam       Sales    Adam    Adam
2    3456  102   Blake  Consultant   Blake  Debbie
4    5678  103   Chris     Analyst   Chris    Eric
1    2345  104  Debbie        Tech  Debbie   Blake
3    4567  105    Eric     Advisor    Eric   Chris
wljmcqd8

wljmcqd83#

首先创建字典,并将named-aggregationGroupBy.agg一起使用:

d = {c: (c, 'last') for c in df.columns}
d['Latest'] = ('Name', 'last')
d['Prior'] = ('Name', lambda x: (u:=x.unique())[-2 if len(u) > 1 else -1])
    
out = (df.sort_values('Date', key=pd.to_datetime)
         .groupby('Job_ID')
         .agg(**d)
         .reset_index(drop=True))

print (out)
    ID     Date    Name         Job  Job_ID  Latest   Prior
0  101  03/2022    Adam       Sales    1234    Adam    Adam
1  104  03/2022  Debbie        Tech    2345  Debbie   Blake
2  104  02/2022  Debbie  Consultant    3456  Debbie  Debbie
3  105  03/2022    Eric     Advisor    4567    Eric   Chris
4  103  03/2022   Chris     Analyst    5678   Chris    Eric

相关问题