如何使用numpy在2d数组上执行最大值/平均值合并

68bkxrlz  于 2022-12-29  发布在  其他
关注(0)|答案(8)|浏览(172)

给定一个2D(M x N)矩阵和一个2D核(K x L),我如何返回一个矩阵,该矩阵是在图像上使用给定核的最大值或平均值合并的结果?
如果可能的话,我想用numpy。
注:M,N,K,L可以是偶数或奇数,它们不需要完全被对方整除,例如:7x5矩阵和2x2内核。
例如最大合并:

matrix:
array([[  20,  200,   -5,   23],
       [ -13,  134,  119,  100],
       [ 120,   32,   49,   25],
       [-120,   12,   09,   23]])
kernel: 2 x 2
soln:
array([[  200,  119],
       [  120,   49]])
gxwragnw

gxwragnw1#

你可以使用scikit-image block_reduce:

import numpy as np
import skimage.measure

a = np.array([
      [  20,  200,   -5,   23],
      [ -13,  134,  119,  100],
      [ 120,   32,   49,   25],
      [-120,   12,    9,   23]
])
skimage.measure.block_reduce(a, (2,2), np.max)

给出:

array([[200, 119],
       [120,  49]])
waxmsbnn

waxmsbnn2#

如果图像大小可以被内核大小整除,则可以调整数组的形状,并根据需要使用maxmean

import numpy as np

mat = np.array([[  20,  200,   -5,   23],
       [ -13,  134,  119,  100],
       [ 120,   32,   49,   25],
       [-120,   12,   9,   23]])

M, N = mat.shape
K = 2
L = 2

MK = M // K
NL = N // L
print(mat[:MK*K, :NL*L].reshape(MK, K, NL, L).max(axis=(1, 3)))
# [[200, 119], [120, 49]]

如果内核数量不是偶数,则必须单独处理边界(如注解中所指出的,这会导致矩阵被复制,从而影响性能)。

mat = np.array([[20,  200,   -5,   23, 7],
                [-13,  134,  119,  100, 8],
                [120,   32,   49,   25, 12],
                [-120,   12,   9,   23, 15],
                [-57,   84,   19,   17, 82],
                ])
# soln
# [200, 119, 8]
# [120, 49, 15]
# [84, 19, 82]
M, N = mat.shape
K = 2
L = 2

MK = M // K
NL = N // L

# split the matrix into 'quadrants'
Q1 = mat[:MK * K, :NL * L].reshape(MK, K, NL, L).max(axis=(1, 3))
Q2 = mat[MK * K:, :NL * L].reshape(-1, NL, L).max(axis=2)
Q3 = mat[:MK * K, NL * L:].reshape(MK, K, -1).max(axis=1)
Q4 = mat[MK * K:, NL * L:].max()

# compose the individual quadrants into one new matrix
soln = np.vstack([np.c_[Q1, Q3], np.c_[Q2, Q4]])
print(soln)
# [[200 119   8]
#  [120  49  15]
#  [ 84  19  82]]
kzipqqlq

kzipqqlq3#

我们可以不像艾略特的答案那样做“象限”,而是将其填充为可整除的,然后执行最大值池或平均值池。
由于CNN中经常使用池,所以输入数组通常是3D的,所以我做了一个函数,既可以对2D数组工作,也可以对3D数组工作。

def pooling(mat,ksize,method='max',pad=False):
    '''Non-overlapping pooling on 2D or 3D data.

    <mat>: ndarray, input array to pool.
    <ksize>: tuple of 2, kernel size in (ky, kx).
    <method>: str, 'max for max-pooling, 
                   'mean' for mean-pooling.
    <pad>: bool, pad <mat> or not. If no pad, output has size
           n//f, n being <mat> size, f being kernel size.
           if pad, output has size ceil(n/f).

    Return <result>: pooled matrix.
    '''

    m, n = mat.shape[:2]
    ky,kx=ksize

    _ceil=lambda x,y: int(numpy.ceil(x/float(y)))

    if pad:
        ny=_ceil(m,ky)
        nx=_ceil(n,kx)
        size=(ny*ky, nx*kx)+mat.shape[2:]
        mat_pad=numpy.full(size,numpy.nan)
        mat_pad[:m,:n,...]=mat
    else:
        ny=m//ky
        nx=n//kx
        mat_pad=mat[:ny*ky, :nx*kx, ...]

    new_shape=(ny,ky,nx,kx)+mat.shape[2:]

    if method=='max':
        result=numpy.nanmax(mat_pad.reshape(new_shape),axis=(1,3))
    else:
        result=numpy.nanmean(mat_pad.reshape(new_shape),axis=(1,3))

    return result

有时候你可能想执行重叠池,步长不等于内核大小。下面是一个函数,它可以完成这个任务,不管有没有填充:

def asStride(arr,sub_shape,stride):
    '''Get a strided sub-matrices view of an ndarray.
    See also skimage.util.shape.view_as_windows()
    '''
    s0,s1=arr.strides[:2]
    m1,n1=arr.shape[:2]
    m2,n2=sub_shape
    view_shape=(1+(m1-m2)//stride[0],1+(n1-n2)//stride[1],m2,n2)+arr.shape[2:]
    strides=(stride[0]*s0,stride[1]*s1,s0,s1)+arr.strides[2:]
    subs=numpy.lib.stride_tricks.as_strided(arr,view_shape,strides=strides)
    return subs

def poolingOverlap(mat,ksize,stride=None,method='max',pad=False):
    '''Overlapping pooling on 2D or 3D data.

    <mat>: ndarray, input array to pool.
    <ksize>: tuple of 2, kernel size in (ky, kx).
    <stride>: tuple of 2 or None, stride of pooling window.
              If None, same as <ksize> (non-overlapping pooling).
    <method>: str, 'max for max-pooling,
                   'mean' for mean-pooling.
    <pad>: bool, pad <mat> or not. If no pad, output has size
           (n-f)//s+1, n being <mat> size, f being kernel size, s stride.
           if pad, output has size ceil(n/s).

    Return <result>: pooled matrix.
    '''

    m, n = mat.shape[:2]
    ky,kx=ksize
    if stride is None:
        stride=(ky,kx)
    sy,sx=stride

    _ceil=lambda x,y: int(numpy.ceil(x/float(y)))

    if pad:
        ny=_ceil(m,sy)
        nx=_ceil(n,sx)
        size=((ny-1)*sy+ky, (nx-1)*sx+kx) + mat.shape[2:]
        mat_pad=numpy.full(size,numpy.nan)
        mat_pad[:m,:n,...]=mat
    else:
        mat_pad=mat[:(m-ky)//sy*sy+ky, :(n-kx)//sx*sx+kx, ...]

    view=asStride(mat_pad,ksize,stride)

    if method=='max':
        result=numpy.nanmax(view,axis=(2,3))
    else:
        result=numpy.nanmean(view,axis=(2,3))

    return result
ie3xauqp

ie3xauqp4#

Another solution uses the little-known magic of np.maximum.at (or you can adapt this to mean-pooling using np.add.at and dividing)

def max_pool(img, factor: int):
    """ Perform max pooling with a (factor x factor) kernel"""
    ds_img = np.full((img.shape[0] // factor, img.shape[1] // factor), -float('inf'), dtype=img.dtype)
    np.maximum.at(ds_img, (np.arange(img.shape[0])[:, None] // factor, np.arange(img.shape[1]) // factor), img)
    return ds_img

示例用法:

img = np.array([[20, 200, -5, 23],
                [-13, 134, 119, 100],
                [120, 32, 49, 25],
                [-120, 12, 9, 23]])

print(f'Input: \n{img}')

print(f"Output: \n{max_pool(img, factor=2)}")

印刷品

Input: 
[[  20  200   -5   23]
 [ -13  134  119  100]
 [ 120   32   49   25]
 [-120   12    9   23]]
Output: 
[[200 119]
 [120  49]]

不幸的是,它看起来有点慢,所以我仍然会去与mdh提供的解决方案

c3frrgcw

c3frrgcw5#

由于numpy文档要求“极其小心”地使用“numpy.lib.stride_tricks.as_strided”,这里有另一个不使用它的2D/3D池的解决方案。
如果跨距=1,它会导致使用相同的填充。对于跨距〉1,我不是100%确定如何定义相同的填充。

def pool3D(arr,
           kernel=(2, 2, 2),
           stride=(1, 1, 1),
           func=np.nanmax,
           ):
    # check inputs
    assert arr.ndim == 3
    assert len(kernel) == 3

    # create array with lots of padding around it, from which we grab stuff (could be more efficient, yes)
    arr_padded_shape = arr.shape + 2 * np.array(kernel)
    arr_padded = np.zeros(arr_padded_shape, dtype=arr.dtype) * np.nan
    arr_padded[
    kernel[0]:kernel[0] + arr.shape[0],
    kernel[1]:kernel[1] + arr.shape[1],
    kernel[2]:kernel[2] + arr.shape[2],
    ] = arr

    # create temporary array, which aggregates kernel elements in last axis
    size_x = 1 + (arr.shape[0]-1) // stride[0]
    size_y = 1 + (arr.shape[1]-1) // stride[1]
    size_z = 1 + (arr.shape[2]-1) // stride[2]
    size_kernel = np.prod(kernel)
    arr_tmp = np.empty((size_x, size_y, size_z, size_kernel), dtype=arr.dtype)

    # fill temporary array
    kx_center = (kernel[0] - 1) // 2
    ky_center = (kernel[1] - 1) // 2
    kz_center = (kernel[2] - 1) // 2
    idx_kernel = 0
    for kx in range(kernel[0]):
        dx = kernel[0] + kx - kx_center
        for ky in range(kernel[1]):
            dy = kernel[1] + ky - ky_center
            for kz in range(kernel[2]):
                dz = kernel[2] + kz - kz_center
                arr_tmp[:, :, :, idx_kernel] = arr_padded[
                                               dx:dx + arr.shape[0]:stride[0],
                                               dy:dy + arr.shape[1]:stride[1],
                                               dz:dz + arr.shape[2]:stride[2],
                                               ]
                idx_kernel += 1

    # perform pool function
    arr_final = func(arr_tmp, axis=-1)
    return arr_final

def pool2D(arr,
           kernel=(2, 2),
           stride=(1, 1),
           func=np.nanmax,
           ):
    # check inputs
    assert arr.ndim == 2
    assert len(kernel) == 2

    # transform into 3D array with empty dimension?
    arr3D = arr[..., np.newaxis]
    kernel3D = kernel + (1,)
    stride3D = stride + (1,)
    arr3D_final = pool3D(arr3D, kernel3D, stride3D, func)
    arr2D_final = arr3D_final[:, :, 0]

    return arr2D_final
wz1wpwve

wz1wpwve6#

3 x 3内核和方阵a的最大池

a = np.array(a)
return [[a[i-1:i+2,j-1:j+2].max() for j in range(1,len(a)-1)] for i in range(1,len(a)-1)]
z3yyvxxp

z3yyvxxp7#

这个函数可以在任何大小的内核上应用max pooling,只使用numpy函数。

def max_pooling(feature_map : np.ndarray, kernel : tuple) -> np.ndarray:
    """
    Applies max pooling to a feature map.

    Parameters
    ----------
    feature_map : np.ndarray
        A 2D or 3D feature map to apply max pooling to.
    kernel : tuple
        The size of the kernel to use for max pooling.

    Returns
    -------
    np.ndarray
        The feature map after max pooling was applied.
    """

    # Check if it fits without padding the feature map
    if feature_map.shape[0] % kernel[0] != 0:
        # Add padding to the feature map
        feature_map = np.pad(feature_map, ((0, kernel[0] - feature_map.shape[0] % kernel[0]), (0, 0), (0,0)), 'constant')
    
    if feature_map.shape[1] % kernel[1] != 0:
        feature_map = np.pad(feature_map, ((0, 0), (0, kernel[1] - feature_map.shape[1] % kernel[1]), (0,0)), 'constant')
    
    # Apply max pooling to the padded feature map
    pooled = feature_map.reshape(feature_map.shape[0] // kernel[0], 
                                 kernel[0], 
                                 feature_map.shape[1] // kernel[1], 
                                 kernel[1]
                                 ).max(axis=(1, 3))
    return pooled
2fjabf4q

2fjabf4q8#

你也可以使用numpy的as_strided()函数来做同样的事情,所以,我们的想法是使用给定的内核大小和跨距来创建输入的子矩阵,然后简单地沿着高度和宽度轴取最大值。

    • 注意:**使用此方法的主要好处是,它可以扩展为通道(深度)和批次输入!
import numpy as np

np.random.seed(10)

# input
X = np.array([[  20,  200,   -5,   23],
              [ -13,  134,  119,  100],
              [ 120,   32,   49,   25],
              [-120,   12,    9,   23]])

Nh, Nw = X.shape # input size

Kh, Kw = (2,2) # Kernel size (along height and width)

sh, sw = (2,2) # strides along height and width

X
>>> array([[  20,  200,   -5,   23],
           [ -13,  134,  119,  100],
           [ 120,   32,   49,   25],
           [-120,   12,    9,   23]])
Oh = (Nh-Kh)//sh + 1 # output height
Ow = (Nw-Kw)//sw + 1 # output width

# creating appropriate strides
strides = (sh*Nw, sw, Nw, 1) 
strides = tuple(i * X.itemsize for i in strides) 

subM = np.lib.stride_tricks.as_strided(X, shape=(Oh, Ow, Kh, Kw),
                                       strides=strides)
subM
>>>> array([[[[  20,  200],
             [ -13,  134]],

            [[  -5,   23],
             [ 119,  100]]],

           [[[ 120,   32],
             [-120,   12]],

            [[  49,   25],
             [   9,   23]]]])
# taking maximum along the height and width axes. 
np.max(subM, axis=(2,3))
>>> array([[200, 119],
           [120,  49]])

的一个或多个字符
我们有我们需要的输出!

相关问题