1-D numpy数组中的组连续True

shstlldc  于 2023-05-07  发布在  其他
关注(0)|答案(2)|浏览(159)

假设我们有一个布尔数组x=np.array([True, True, False, True, False])。有两个连续的True组。我想要的是创建一个布尔数组l的列表,其中l中的每个数组只包含一组连续的True。例如,x应该与由下式定义的y相同:

y = np.zeros_like(x)
for e in l:
    y = y|e

到目前为止,我唯一成功的尝试是通过https://stackoverflow.com/a/7353335/4755229使用consecutive函数

def consecutive_bools(bool_input):
    consecutive_idx = consecutive(np.argwhere(bool_input).flatten())
    ret = [np.zeros_like(bool_input) for i in range(len(consecutive_idx))]
    for i, idx in enumerate(consecutive_idx):
        ret[i][idx] = True
    return ret

这似乎过于复杂了。有没有更好的(简洁,可能更快)的方法来做到这一点?

mqkwyuun

mqkwyuun1#

考虑以下情况:

import numpy as np

x=np.array([True, True, False, True, False])

idx, = np.where(np.insert(x,0,False) ^ np.insert(x,-1,False))

l = [np.zeros_like(x),np.zeros_like(x)]
l[0][idx[0]:idx[1]] = True
l[1][idx[2]:idx[3]] = True

这里的想法是idx的元素是从True到False的任何切换的索引,反之亦然。由于True正好有2个连续的组,因此idx正好有4个元素。
对于任意数量的连续组:

idx, = np.where(np.insert(x,0,False) ^ np.insert(x,-1,False))

l = [np.zeros_like(x) for _ in range(len(idx)//2)]
for a,p in zip(l,np.split(idx,np.arange(2,len(idx),2))):
    a[slice(*p)] = True
zengzsys

zengzsys2#

一个有趣的方法是构造每个段的开始和停止,然后通过np.arange(x.size)构造一个数组。比较它和所有开始与>=,并比较它和所有停止与<。两个结果的逻辑与产生所需的输出:

def my_consecutive_bools(ar):
    indices, = np.concatenate([ar[:1], ar[:-1] != ar[1:], ar[-1:]]).nonzero()
    arange = np.arange(ar.size)
    return np.logical_and(arange >= indices[::2, None],
                          arange < indices[1::2, None])
>>> x = np.array([True, True, False, True, False])
>>> my_consecutive_bools(x)
array([[ True,  True, False, False, False],
       [False, False, False,  True, False]])

这种方法在一些小阵列上工作良好,但其时间复杂度较高。对于大型数组,您可以简单地迭代start和stop来赋值:

def my_consecutive_bools_loop(ar):
    indices, = np.concatenate([ar[:1], ar[:-1] != ar[1:], ar[-1:]]).nonzero()
    result = np.zeros((indices.size // 2, ar.size), bool)
    for row, start, stop in zip(result, indices[::2], indices[1::2]):
        row[start:stop] = True
    return result

简单基准:

In [_]: rng = np.random.default_rng()

In [_]: small = rng.choice([True, False], 100, p=[0.8, 0.2])

In [_]: big = rng.choice([True, False], 100000, p=[0.8, 0.2])

In [_]: %timeit consecutive_bools(small)
109 µs ± 286 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [_]: %timeit my_consecutive_bools(small)
13.3 µs ± 46.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [_]: %timeit my_consecutive_bools_loop(small)
20 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [_]: %timeit consecutive_bools(big)
699 ms ± 6.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [_]: %timeit my_consecutive_bools(big)
2.98 s ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [_]: %timeit my_consecutive_bools_loop(big)
33.4 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

相关问题