未满足条件时重置pandas累积和[重复]

egdjgwm8  于 2023-05-27  发布在  其他
关注(0)|答案(1)|浏览(132)

此问题已在此处有答案

Make Pandas groupby act similarly to itertools groupby(3个答案)
How can I do a sequential count based on column value and timestamp in pandas?(3个答案)
11小时前关闭
我经历了不同的stackoverflow问题,最后张贴它,因为我不能解决我面临的问题之一。我有一个类似下面的数据框

A        B          C
group1   group1_c   12
group1   group1_c   12
group1   group1_c   12
group1   group1_c   1
group1   group1_c   12
group1   group1_c   12

我必须将两行匹配在一起,只要值匹配,我就对它进行cumsum。要做到这一点

df['cumul'] = df['C'].eq(df.groupby(['A','B'])['C'].shift(1).ffill()).groupby([df['A'],df['B']).cumsum()

一旦我这么做了

A        B          C    Cumul
group1   group1_c   12   0
group1   group1_c   12   1
group1   group1_c   12   2
group1   group1_c   1    2
group1   group1_c   12   3 
group1   group1_c   12   3

如果不满足条件,则希望重置。预期的解决方案

A        B          C    Cumul
group1   group1_c   12   0
group1   group1_c   12   1
group1   group1_c   12   2
group1   group1_c   1    0
group1   group1_c   12   0 
group1   group1_c   12   1

请咨询谢谢

tjjdgumg

tjjdgumg1#

如果需要对C列的每个连续值的组进行计数,则使用Series.neSeries.shift以及累积和,最后使用计数器GroupBy.cumcount

df['cumul'] = df.groupby(df['C'].ne(df['C'].shift()).cumsum()).cumcount()

print (df)
        A         B   C  cumul
0  group1  group1_c  12      0
1  group1  group1_c  12      1
2  group1  group1_c  12      2
3  group1  group1_c   1      0
4  group1  group1_c  12      0
5  group1  group1_c  12      1

如果每个A, B组都需要,还可以添加两个组:

print (df)
        A         B   C
0  group1  group1_c  12
1  group1  group2_c  12 <-changed groups
2  group1  group2_c  12 <-changed groups
3  group1  group1_c   1
4  group1  group1_c  12
5  group1  group1_c  12

s = df['C'].ne(df['C'].shift()).cumsum()
df['cumul'] = df.groupby([df['A'],df['B'], s]).cumcount()

df['cumul1'] = df.groupby(df['C'].ne(df['C'].shift()).cumsum()).cumcount()
print (df)
        A         B   C  cumul  cumul1
0  group1  group1_c  12      0       0
1  group1  group2_c  12      0       1
2  group1  group2_c  12      1       2
3  group1  group1_c   1      0       0
4  group1  group1_c  12      0       0
5  group1  group1_c  12      1       1

替代解决方案:

s = df[['A','B','C']].ne(df[['A','B','C']].shift()).any(axis=1).cumsum()
df['cumul'] = df.groupby(s).cumcount()

相关问题