numpy 检查条件是否连续满足三次或更多次(Python)

d7v8vwbk  于 2023-06-23  发布在  Python
关注(0)|答案(2)|浏览(161)

我有一个每日值的全球数据集,大约为15000天x 361纬度x 576经度。此数据集是二进制的-在满足条件的位置/天中有一个,在不满足条件的位置/天中有0。我希望只在连续出现3天或更长时间的情况下保留1。目前使用numpy np数组的数据,但我也使用xarray。
我最初的想法是一个3天的滚动总和,并检查它是3,但这只发现中间的日子三天+天的时期,而不是结束。
有什么有效的方法来实现这一点吗?理想情况下,不需要显式地循环遍历每个项,因为这将花费很长时间。先谢谢你了!

dgsult0t

dgsult0t1#

通过首先找到3的集合,然后用or将它们重新 Shuffle 两个元素来实现这一点。以下是易于理解的版本:

import numpy as np
np.random.seed(17)
rands = np.random.randint(2, size=30)
# [1 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1] 

and_rands = rands[:-2] & rands[1:-1] & rands[2:]
# [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]

lhs = np.concatenate((and_rands, np.zeros(2,dtype=and_rands.dtype)))
mid = np.concatenate((np.zeros(1,dtype=and_rands.dtype), and_rands, np.zeros(1, dtype=and_rands.dtype)))
rhs = np.concatenate((np.zeros(2,dtype=and_rands.dtype), and_rands))

result = lhs | mid | rhs
# [1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1]

这里是同样的事情,但规模和内存效率更高:

import numpy as np
np.random.seed(17)
DAYS, DIM2, DIM3 =15000, 361, 576
rands = np.random.randint(2, size=(DAYS, DIM2, DIM3), dtype='i1')
ret = np.zeros((DAYS, DIM2, DIM3), dtype=rands.dtype)
ret[2:, :, :] |= rands[2:, :, :]
ret[2:, :, :] &= rands[1:-1, :, :]
ret[2:, :, :] &= rands[:-2, :, :]
ret[1:-1, :, :] |= ret[2:, :, :]
ret[:-2, :, :] |= ret[1:-1:, :, :]

print(rands[:30, 0, 0])
# [1 1 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 1 1 0 0
 1 1 0 1 1 1 0 1 1 1 0 0 1]
print(ret[:30, 0, 0])
# [1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0
 0 0 0 1 1 1 0 1 1 1 0 0 0]
9q78igpj

9q78igpj2#

一个更具扩展性的方法

from skimage.util import view_as_windows
import numpy as np

def find_blocks(in_arr, window_shape = (3,1,1)):
    dims = len(window_shape)
    windowed_view = view_as_windows(in_arr, window_shape)
    loc = np.logical_and.reduce(windowed_view, tuple(range(-dims, 0)))
    loc_shape = loc.shape
    loc = np.pad(loc, tuple((i-1, 0) for i in window_shape))
    windowed_loc = view_as_windows(loc, loc_shape)
    return np.logical_or.reduce(windowed_loc, tuple(range(dims)))

如果你只需要numpy,我这里有一个复制view_as_windows的配方(甚至有一些添加的功能,比如axis参数,所以你不需要你的window_shape和你的in_arr具有相同的维数)

def find_blocks_np(in_arr, window = 3, axis = 0):
    windowed_view = window_nd(in_arr, window, axis = axis)
    loc = np.logical_and.reduce(windowed_view, axis + 1)
    loc_shape = loc.shape
    padder = tuple((i-j, 0) for i, j in zip(in_arr.shape, loc.shape))
    loc = np.pad(loc, padder)
    windowed_loc = window_nd(loc, loc_shape)
    return np.logical_or.reduce(windowed_loc, 0)

相关问题