numpy bincount可以使用2D数组吗?

0lvr5msh  于 2023-03-30  发布在  其他
关注(0)|答案(3)|浏览(109)

我看到了numpy bincount的行为,我无法理解。我想以行方式将2D数组中的值装箱,并看到下面的行为。为什么它可以与dbArray一起工作,但与simarray一起失败?

>>> dbArray
array([[1, 0, 1, 0, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 0, 1, 1],
       [1, 0, 0, 0, 0],
       [0, 0, 0, 1, 1],
       [0, 1, 0, 1, 0]])
>>> N.apply_along_axis(N.bincount,1,dbArray)
array([[2, 3],
       [0, 5],
       [1, 4],
       [4, 1],
       [3, 2],
       [3, 2]], dtype=int64)
>>> simarray
array([[2, 0, 2, 0, 2],
       [2, 1, 2, 1, 2],
       [2, 1, 1, 1, 2],
       [2, 0, 1, 0, 1],
       [1, 0, 1, 1, 2],
       [1, 1, 1, 1, 1]])
>>> N.apply_along_axis(N.bincount,1,simarray)

Traceback (most recent call last):
  File "<pyshell#31>", line 1, in <module>
    N.apply_along_axis(N.bincount,1,simarray)
  File "C:\Python27\lib\site-packages\numpy\lib\shape_base.py", line 118, in apply_along_axis
    outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (3)
smdnsysy

smdnsysy1#

问题是bincount并不总是返回相同形状的对象,特别是当值丢失时。例如:

>>> m = np.array([[0,0,1],[1,1,0],[1,1,1]])
>>> np.apply_along_axis(np.bincount, 1, m)
array([[2, 1],
       [1, 2],
       [0, 3]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([2, 1]), array([1, 2]), array([0, 3])]

工作,但:

>>> m = np.array([[0,0,0],[1,1,0],[1,1,0]])
>>> m
array([[0, 0, 0],
       [1, 1, 0],
       [1, 1, 0]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([3]), array([1, 2]), array([1, 2])]
>>> np.apply_along_axis(np.bincount, 1, m)
Traceback (most recent call last):
  File "<ipython-input-49-72e06e26a718>", line 1, in <module>
    np.apply_along_axis(np.bincount, 1, m)
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis
    outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (1)

不会。
您可以使用minlength参数,并使用lambdapartial或其他方式传递它:

>>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m)
array([[3, 0],
       [1, 2],
       [1, 2]])
lnlaulya

lnlaulya2#

正如@DSM已经提到的,在不知道数组的最大值的情况下,无法对2d数组进行bincount,因为这意味着数组大小的不一致。
但是由于numpy强大的索引功能,很容易实现更快的2d bincount,因为它不使用连接或任何东西。

def bincount2d(arr, bins=None):
    if bins is None:
        bins = np.max(arr) + 1
    count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
    indexing = (np.ones_like(arr).T * np.arange(len(arr))).T
    np.add.at(count, (indexing, arr), 1)

    return count
doinxwow

doinxwow3#

这是一个完全按照你的要求执行的函数,但没有任何循环。

def sub_sum_partition(a, partition):
    """
    Generalization of np.bincount(partition, a).
    Sums rows of a matrix for each value of array of non-negative ints.

    :param a: array_like
    :param partition: array_like, 1 dimension, nonnegative ints
    :return: matrix of shape ('one larger than the largest value in partition', a.shape[1:]). The i's element is
    the sum of rows j in 'a' s.t. partition[j] == i
    """
    assert partition.shape == (len(a),)
    n = np.prod(a.shape[1:], dtype=int)
    bins = ((np.tile(partition, (n, 1)) * n).T + np.arange(n, dtype=int)).reshape(-1)
    sums = np.bincount(bins, a.reshape(-1))
    if n > 1:
        sums = sums.reshape(-1, *a.shape[1:])
    return sums

相关问题