Pandas如何分组数据之间的累计值达到0

xurqigkl 于 2022-12-09 发布在其他

关注(0)|答案(3)|浏览(273)

我是编程新手;用Python踩进去;听说最好像你一样学习。所以我一直在玩数据集，并完成了我的大部分代码;但我了解到Pandas有许多我一直没有利用的便利功能;
我希望在累积值达到0时对数据进行分组;因此包括0;并将其之前的数据分组;并且直到累积值命中0的下一串数据也将被分组;更容易看到
该数据集;

d = {'col1': [1, 2, 3, 4, 5, 6,7,8], 'col2': [10, 1, -1, -10, 5, 1,-6,5] }
d = pd.DataFrame(d)

gives something like this; and I would like when col2 cum value reach 0; to group/rename col 1 entries by the first value that took it away from 0.

   col1  col2
0     1    10
1     2     1
2     3    -1
3     4   -10
4     5     5
5     6     1
6     7    -6
7     8     5

So ideally something like this

   col1  col2  cumvalue
0     1    10        10
1     1     1        11
2     1    -1        10
3     1   -10         0
4     2     5         5
5     2     1         6
6     2    -6         0
7     3     5         5

我试过df.groupby（），但是我就是找不到正确的语法来得到它！谢谢！

pandas

来源：https://stackoverflow.com/questions/74722557/pandas-how-to-group-data-between-a-cumulative-value-reaching-0

3条答案

按热度按时间

fwzugrvs1#

试试看：

d["cumvalue"] = d["col2"].cumsum()
d["col1"] = d["cumvalue"].eq(0).cumsum().shift().fillna(0).astype(int) + 1

print(d)

印刷品：

col1  col2  cumvalue
0     1    10        10
1     1     1        11
2     1    -1        10
3     1   -10         0
4     2     5         5
5     2     1         6
6     2    -6         0
7     3     5         5

赞(0）回复(0）举报 2022-12-09

u3r8eeie2#

这可以通过循环遍历 Dataframe 中的每一行来实现：

import pandas as pd

d = {'col1': [1, 2, 3, 4, 5, 6,7,8], 'col2': [10, 1, -1, -10, 5, 1,-6,5] }
d = pd.DataFrame(d)

group_index = 1
cumvalue = 0

cumvalue_list = []
groupindex_list = []
for index, row in d.iterrows():
    groupindex_list.append(group_index)
    cumvalue = cumvalue + row['col2']
    cumvalue_list.append(cumvalue)
    
    if cumvalue == 0:
        group_index = group_index + 1
    
d['col1'] = groupindex_list
d['cumvalue'] = cumvalue_list

   col1  col2  cumvalue
0     1    10        10
1     1     1        11
2     1    -1        10
3     1   -10         0
4     2     5         5
5     2     1         6
6     2    -6         0
7     3     5         5

赞(0）回复(0）举报 2022-12-09

8ehkhllq3#

您可以：

df['cumvalue'] = df['col2'].cumsum()
df['col1'] = df.groupby(df['cumvalue']==0).cumcount() + 1
df['col1'] = df['col1'].mask(df['cumvalue'] !=0).bfill().ffill().astype(int)

打印（df）：

col1  col2  cumvalue
0     1    10        10
1     1     1        11
2     1    -1        10
3     1   -10         0
4     2     5         5
5     2     1         6
6     2    -6         0
7     2     5         5

赞(0）回复(0）举报 2022-12-09

我来回答

Pandas如何分组数据之间的累计值达到0

3条答案

相关问题

热门标签

最新问答