pandas python panda Dataframe ，添加列并标记调整和插入的行

wvt8vs2t 于 2023-02-20 发布在 Python

关注(0)|答案(1)|浏览(132)

我有以下的数据框架

import pandas as pd
df = pd.DataFrame()
df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake')
df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING')
print(df)

如果单个小时数的运行余额达到38小时，则对达到第38小时的单元格进行调整，插入重复行，并将小时数余额添加到下一行。以下代码执行此操作，可以看到原始数据与调整后数据的输出差异。

s = df.groupby('number')['hours'].cumsum()
m = s.gt(38)
idx = m.groupby(df['number']).idxmax()
delta = s.groupby(df['number']).shift().rsub(38).fillna(s)
out = df.loc[df.index.repeat((df.index.isin(idx)&m)+1)]
out.loc[out.index.duplicated(keep='last'), 'hours'] = delta
out.loc[out.index.duplicated(), 'hours'] -= delta
print(out)

对于调整的行和插入的行，我需要通过插入另一列并添加字符（如“x”）来标记它们，以突出显示调整和插入的行

pandas

来源：https://stackoverflow.com/questions/75496801/python-pandas-dataframe-add-column-and-tag-an-adjusted-and-inserted-row

1条答案

按热度按时间

xdnvmnnf1#

复制索引时，可以使用out.index.duplicated作为布尔掩码：

# or out['mod'] = np.where(out.index.duplicated(keep=False), 'x', '-')
out.loc[out.index.duplicated(keep=False), 'mod'] = 'x'
print(out)

# Output
    number      name  hours     loc  mod
0      651      Alex   8.25     Nar  NaN
1      651      Alex   7.50     SCC  NaN
2      651      Alex   7.50     RSL  NaN
3     4267     Ankit   7.50  UNIT-C  NaN
4     4267     Ankit  14.00  UNIT-C  NaN
5     4267     Ankit  12.00  UNIT-C  NaN
6     4267     Ankit   4.50  UNIT-C    x  # index 6
6     4267     Ankit  10.50  UNIT-C    x  # twice
7     4267     Ankit  11.00  UNIT-C  NaN
8     4267     Ankit   6.50  UNIT-C  NaN
9     4267     Ankit  14.00  UNIT-C  NaN
10    8806  Abhishek  15.00     UNI  NaN
11    8806  Abhishek  15.00     UNI  NaN
12    8806  Abhishek   8.00     UNI    x  # index 12
12    8806  Abhishek   5.50     UNI    x  # twice
13    6841     Blake   8.00  UNKING  NaN
14    6841     Blake   8.00  UNKING  NaN
15    6841     Blake   8.00  UNKING  NaN
16    6841     Blake   8.00  UNKING  NaN

赞(0）回复(0）举报 2023-02-20

我来回答

pandas python panda Dataframe ，添加列并标记调整和插入的行

1条答案

相关问题

热门标签

最新问答