pandas 基于2个相邻列条件的累积和

e4yzc0pl  于 9个月前  发布在  其他
关注(0)|答案(2)|浏览(94)

抱歉,我是coding/pandas/python的新手。
我试图求和列只有当有一个“1”位于“买”和“卖”相邻列
| 日期|买|卖|值|累积和|
| --|--|--|--|--|
| 2023年01月01日||| 1 ||
| 2023年02月01日星期一| 1 || 5 | 5 |
| 2019 - 03 - 01||| 1 | 6 |
| 2019 - 04 - 01||| 1 | 7 |
| 2019 - 05 - 23|| 1 | 1 | 8 |
| 2019 - 06 - 23||| 5 ||
我看到有一个名为cumsum(axis=None,skipna=True,*args,kwargs)的公式,可能可以做到这一点。但不幸的是,我不太确定如何根据buysell**列包含条件
我不知道如何用pandas公式来做这件事。我可能需要使用一个循环来做这件事吗?
任何建议或提示将不胜感激!!

e4eetjau

e4eetjau1#

尽量避免循环。我想这就是你要找的:

df1 = pd.DataFrame({
                    'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12'], 
                    'buy': [None,1,None,None,None,None,None,1,None,None,None,None],
                    'sell':[None,None,None,None,1,None,None,None,None,None,1,None],
                    'value':[1,5,1,1,1,5,1,5,1,1,1,5]
                    }
                  )

# make sell negative so it can be discerned from buy
df1['sell'] = df1['sell'] * -1
        
print(df1)
          date  buy  sell  value
0   2023-01-01  NaN   NaN      1
1   2023-01-02  1.0   NaN      5
2   2023-01-03  NaN   NaN      1
3   2023-01-04  NaN   NaN      1
4   2023-01-05  NaN  -1.0      1
5   2023-01-06  NaN   NaN      5
6   2023-01-07  NaN   NaN      1
7   2023-01-08  1.0   NaN      5
8   2023-01-09  NaN   NaN      1
9   2023-01-10  NaN   NaN      1
10  2023-01-11  NaN  -1.0      1
11  2023-01-12  NaN   NaN      5
        
# create 'buysell' column which is combination of 'buy' and 'sell' columns
        
df1['buysell'] = df1['buy'].combine_first(df1['sell'])
        
# use ffill() to fill buysell from 1 until it is not 1
df1.loc[df1['buysell'].ffill() == 1, 'buysell'] = 1
        
        
# create a mask for where 'buysell' is not NaN
mask = ~df1['buysell'].isna()
            
# use the mask to create a 'buysellvalue' column with the contents of 'value' column for rows where the mask is true
df1.loc[mask, 'buysellvalue'] = df1.loc[mask, 'value']
            
# use cumsum()
df1['cumbuysellvalue'] = df1['buysellvalue'].cumsum()
            
print(df1)
          date  buy  sell  value  buysell  buysellval  cumbuysellval
0   2023-01-01  NaN   NaN      1      NaN         NaN            NaN
1   2023-01-02  1.0   NaN      5      1.0         5.0            5.0
2   2023-01-03  NaN   NaN      1      1.0         1.0            6.0
3   2023-01-04  NaN   NaN      1      1.0         1.0            7.0
4   2023-01-05  NaN  -1.0      1     -1.0         1.0            8.0
5   2023-01-06  NaN   NaN      5      NaN         NaN            NaN
6   2023-01-07  NaN   NaN      1      NaN         NaN            NaN
7   2023-01-08  1.0   NaN      5      1.0         5.0           13.0
8   2023-01-09  NaN   NaN      1      1.0         1.0           14.0
9   2023-01-10  NaN   NaN      1      1.0         1.0           15.0
10  2023-01-11  NaN  -1.0      1     -1.0         1.0           16.0
11  2023-01-12  NaN   NaN      5      NaN         NaN            NaN

字符串

dfddblmv

dfddblmv2#

看起来你打算在一对买入和卖出之间做一个累计。我们可以在一对买入/卖出之间创建不同的组,然后为这些组做一个累计。要做groupby,我们可以创建一个临时的buysellgroup列,它将帮助我们在多对买入/卖出之间做groupby。
假设我们有下面的表格:
x1c 0d1x的数据
然后,我们可以在value = 1.0的每对买入/卖出单元格之间创建一个buy_sell_group

buy_sell_group = [np.nan for _ in range(len(df))]
i = 1
for b,s in zip(df.loc[df.buy == 1.0].index, df.loc[df.sell == 1.0].index):
    buy_sell_group[b:s+1] = [i] * (s+1-b)
    i += 1
df['buy_sell_group'] = buy_sell_group

字符串
这导致下面的矩阵



然后,我们可以对'buy_sell_group'列执行groupby,并对'value'列执行cumsum

df['cumsum'] = df.groupby('buy_sell_group')['value'].cumsum()
df = df.drop('buy_sell_group', axis=1)


最后一个框架

请注意,上述实现假设:

  • 每次买入=1,对应的卖出=1,
  • 每个买入/卖出对之间不会有任何其他买入/卖出= 1值

相关问题