在Python/pandas数组中为整个子集计数连续布尔值

beq87vna 于 2023-04-10 发布在 Python

关注(0)|答案(2)|浏览(76)

我正在寻找一种方法来聚合Pandas Dataframe 连续相同的价值观和执行行动，如计数或最大的这个聚合。
例如，如果我在df中有一列：

结果需要是：

为什么：我们开始有两个0，接下来有三个1，...
我需要的是类似于这个answer，但对于组中的所有元素，我需要相同的值。
首选的答案是显示连续相同元素的聚合，并对其应用聚合函数。这样我就可以做甚至最大值：

my_column    other_value
0        0           7
1        0           4
2        1           1
3        1           0
4        1           5
5        0           1
6        0           1
7        0           2
8        0           8
9        1           1
10       1           0
11       0           2

结果就是

pandas

来源：https://stackoverflow.com/questions/75946126/count-consecutive-boolean-values-in-python-pandas-array-for-whole-subset

2条答案

按热度按时间

p1iqtdky1#

您可以用途：

g = df["my_column"].ne(df["my_column"].shift()).cumsum()

out = df.groupby(g)["my_column"].transform("count")

输出：

print(out)

    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

注意：要获得最大值，请使用df.groupby(g)["other_value"].transform("max")。

赞(0）回复(0）举报 2023-04-10

6ie5vjzr2#

如果检查linked answer，则存在通过连续值进行分组的确切方法：

(y != y.shift()).cumsum()

因此，如果创建每列my_column的连续组，则输出为：

g = df["my_column"].ne(df["my_column"].shift()).cumsum()

print (g)
0     1
1     1
2     2
3     2
4     2
5     3
6     3
7     3
8     3
9     4
10    4
11    5
Name: my_column, dtype: int32

可以将GroupBy.transform和GroupBy.size用于每组计数值
如有必要，一列DataFrame加上Series.to_frame。

注意：DataFrameGroupBy.count用于忽略缺失值的计数值，这里工作，因为没有NaN s*

df1 = df.groupby(g)['my_column'].transform('size').to_frame()
print (df1)
    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

或者Series.map与Series.value_counts：

df1 = g.map(g.value_counts()).to_frame()
print (df1)
    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

第二种解决方案类似：

g = df["my_column"].ne(df["my_column"].shift()).cumsum()

df1 = df.groupby(g)['other_value'].transform('max').to_frame(name='result')
print (df1)
    result
0        7
1        7
2        5
3        5
4        5
5        8
6        8
7        8
8        8
9        1
10       1
11       2

赞(0）回复(0）举报 2023-04-10

我来回答

在Python/pandas数组中为整个子集计数连续布尔值

2条答案

相关问题

热门标签

最新问答