Python-numpy数组中每个值的最大连续计数

gcmastyq  于 2023-10-19  发布在  Python
关注(0)|答案(2)|浏览(111)

给定一个一维numpy数组,目标是计算特定连续值的最大数量,例如,给定一个数组arr

arr=np.array([1,1,1,2,2,3,4,4,4,4,4,2,4,4])

我想返回每个数字的最大连续值的数量。结果是一个二维数组,第一列是每个数字,第二列是每个数字的最大连续计数。

result=np.array([[1,3],[2,2],[3,1],[4,5]])
pw9qyyiw

pw9qyyiw1#

这对于pandas来说很容易做到:

s = pd.Series(arr)
out = (s.groupby(s.ne(s.shift()).cumsum(), sort=False) # group consecutive values
        .agg({'first', 'size'})                        # get value and count
        .groupby('first', sort=False)['size'].max()    # max count per value
        .reset_index().to_numpy()                      # back to numpy
      )

对于纯numpy,它稍微复杂一些:

arr = np.array([1,1,1,2,2,3,4,4,4,4,4,2,4,4])

# identify the consecutive values
idx = np.nonzero(np.diff(arr))[0]
# array([ 2,  4,  5, 10, 11])

# get single value of consecutive ones
i = np.r_[arr[idx], arr[-1]]
# array([1, 2, 3, 4, 2, 4])

# count the number of replicates
n = np.diff(np.r_[0, idx+1, arr.shape[0]])
# array([3, 2, 1, 5, 1, 2])

# sort by value and count
order = np.lexsort([n, i])
# array([0, 4, 1, 2, 5, 3])

i2 = i[order]
# array([1, 2, 2, 3, 4, 4])

m = np.r_[np.diff(i2)!=0, True]
# array([ True, False,  True,  True, False,  True])

# combine
out = np.vstack([i2[m], n[order][m]]).T

输出量:

array([[1, 3],
       [2, 2],
       [3, 1],
       [4, 5]])

使用纯pythonitertools.groupby

from itertools import groupby

out = {}
for k, g in groupby(arr):
    out[k] = max(out.get(k, -1), len(list(g)))

out = list(out.items())

输出:[(1, 3), (2, 2), (3, 1), (4, 5)]

定时比较

使用随机数组作为输入(np.random.randint(1, 5, size=N))。

nafvub8i

nafvub8i2#

迭代每个值,并记录连续值的数量。如果该值发生变化,则更新最大计数(如果先前的运行计数更高),然后将运行计数重置为1。最大计数使用defaultdict存储,然后转换为数组。

from collections import defaultdict

arr=np.array([1,1,1,2,2,3,4,4,4,4,4,2,4,4])

prev = None
max_consecutive_count = defaultdict(int)
running_count = 0

for val in arr:
    if val == prev or not prev:
        running_count += 1
    else:
        if running_count > max_consecutive_count[prev]:
            max_consecutive_count[prev] = running_count
        running_count = 1
    
    prev = val
else:
    if running_count > max_consecutive_count[val]:
        max_consecutive_count[val] = running_count
    
max_consecutive_count_arr = [[k,v] for k,v in max_consecutive_count.items()]

相关问题