我正在努力寻找我的代码中的错误,并希望就如何修复问题和进度寻求您的建议。本质上,我试图计算Pandas DataFrame列的累积总和。条件是累积总和输出在福尔斯负值时重置为0。DF由产品类型/活动/数量(购买:+ve/卖出:-ve值)。我提供了代码来构建一个模拟的数据框架和代码,我用来计算累积和。然而,我并没有真正得到我所期望的输出。该表还包括2个额外的列(desired_output & py_output)-former是我所期望的结果,然后是我在Python中运行代码时看到的输出。我使用下面的代码片段来获取['quantity']列的累积和:
neg = df['quantity'] < 0
df['py_output'] = df['quantity'].groupby([neg[::-1].cumsum(),df['product']]).cumsum().clip(0)
任何建议/ suggestinos对我的错误和我可以做些什么来获得正确的输出将不胜感激:-)
import pandas as pd
data = [['Product-1', 'Time-1', '1. BUY', 1395, 1395]
, ['Product-1', 'Time-2', '2. SELL', -9684, 0]
, ['Product-1', 'Time-3', '1. BUY', 1352, 1352]
, ['Product-1', 'Time-4', '2. SELL', -1348, 4]
, ['Product-1', 'Time-5', '1. BUY', 1951, 1955]
, ['Product-1', 'Time-6', '2. SELL', -1947, 8]
, ['Product-1', 'Time-7', '1. BUY', 2554, 2562]
, ['Product-1', 'Time-8', '1. BUY', 714, 3276]
, ['Product-1', 'Time-9', '1. BUY', 445, 3721]
, ['Product-1', 'Time-10', '1. BUY', 2948, 6669]
, ['Product-1', 'Time-11', '1. BUY', 1995, 8664]
, ['Product-1', 'Time-12', '2. SELL', -4161, 4503]
, ['Product-1', 'Time-13', '2. SELL', -4161, 342]
, ['Product-1', 'Time-14', '2. SELL', -2895, 0]
, ['Product-1', 'Time-15', '1. BUY', 186, 186]
, ['Product-1', 'Time-16', '1. BUY', 2646, 2832]
, ['Product-1', 'Time-17', '1. BUY', 2594, 5426]
, ['Product-1', 'Time-18', '2. SELL', -3202, 2224]
, ['Product-1', 'Time-19', '1. BUY', 4170, 6394]
, ['Product-1', 'Time-20', '1. BUY', 1766, 8160]
, ['Product-1', 'Time-21', '2. SELL', -4403, 3757]
, ['Product-1', 'Time-22', '2. SELL', -3523, 234]
, ['Product-1', 'Time-23', '1. BUY', 1403, 1637]
, ['Product-1', 'Time-24', '1. BUY', 1566, 3203]
, ['Product-1', 'Time-25', '2. SELL', -1357, 1846]
, ['Product-1', 'Time-26', '2. SELL', -1566, 280]
, ['Product-1', 'Time-27', '1. BUY', 791, 1071]
, ['Product-1', 'Time-28', '1. BUY', 2384, 3455]
, ['Product-1', 'Time-29', '1. BUY', 1292, 4747]
, ['Product-1', 'Time-30', '1. BUY', 1343, 6090]
, ['Product-1', 'Time-31', '1. BUY', 322, 6412]
, ['Product-2', 'Time-1', '1. BUY', 1248, 1248]
, ['Product-2', 'Time-2', '1. BUY', 3276, 4524]
, ['Product-2', 'Time-3', '1. BUY', 707, 5231]
, ['Product-2', 'Time-4', '2. SELL', -3534, 1697]
, ['Product-2', 'Time-5', '1. BUY', 1358, 3055]
, ['Product-2', 'Time-6', '1. BUY', 253, 3308]
, ['Product-2', 'Time-7', '2. SELL', -1082, 2226]
, ['Product-2', 'Time-8', '1. BUY', 238, 2464]
, ['Product-2', 'Time-9', '1. BUY', 371, 2835]]
cols = ['product', 'time', 'activity', 'quantity', 'desired_output']
df = pd.DataFrame(data, columns=cols)
neg = df['quantity'] < 0
df['py_output'] = df['quantity'].groupby([neg[::-1].cumsum(),df['product']]).cumsum().clip(0)
print(df)
我研究了大量的参考资料,包括下面的Stackoverflow线程。然而,不幸的是,我还没有能够找到一个解决方案,将给予我正确的答案。
Python Pandas groupby limited cumulative sum
Cumsum on Pandas DF with reset to zero for negative cumulative values
1条答案
按热度按时间cpjpxq1n1#
如果性能/速度/效率对您来说不是很重要,请尝试使用简单的
for
循环:要分别计算每个乘积的和,可以将
groupby
与transform
一起使用