我有以下的数据框架
import pandas as pd
df = pd.DataFrame()
df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake')
df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING')
print(df)
如果单个小时数的运行余额达到38小时,则对达到第38小时的单元格进行调整,插入重复行,并将小时数余额添加到下一行。以下代码执行此操作,可以看到原始数据与调整后数据的输出差异。
s = df.groupby('number')['hours'].cumsum()
m = s.gt(38)
idx = m.groupby(df['number']).idxmax()
delta = s.groupby(df['number']).shift().rsub(38).fillna(s)
out = df.loc[df.index.repeat((df.index.isin(idx)&m)+1)]
out.loc[out.index.duplicated(keep='last'), 'hours'] = delta
out.loc[out.index.duplicated(), 'hours'] -= delta
print(out)
对于调整的行和插入的行,我需要通过插入另一列并添加字符(如“x”)来标记它们,以突出显示调整和插入的行
1条答案
按热度按时间xdnvmnnf1#
复制索引时,可以使用
out.index.duplicated
作为布尔掩码: