如何找到numpy数组中每个元素的出现次数？

s4chpxco 于 2023-06-23 发布在其他

关注(0)|答案(5)|浏览(136)

给定一个整数数组，我想得到一个大小相同的数组，其中每个值都是原始数组中相应元素的出现次数。
例如，给定以下数组：

a = np.array([1, 1, 4, 10, 5, 3, 5, 5, 8, 9])

这应该是结果：

array([2, 2, 1, 1, 3, 1, 3, 3, 1, 1])

尽管通过collections.Counter或内置的list.count()可以直接实现这一点，但我正在寻找一种更高效的方法来处理大型列表。

numpy

来源：https://stackoverflow.com/questions/76443854/how-to-find-a-number-of-occurrences-of-every-element-of-a-numpy-array

5条答案

按热度按时间

xpcnnkqh1#

您可以使用np.unique并使用参数return_inverse和return_counts。使用return_inverse索引return_counts以获得所需的结果。

return_inversebool，可选的如果True，也返回可以用于重建ar的唯一数组的索引（对于指定的轴，如果提供）。
return_countbool，可选的If True，还返回每个唯一项在ar中出现的次数。

_, idx, c = np.unique(a, return_inverse=True, return_counts=True)
c[idx]
# array([2, 2, 1, 1, 3, 1, 3, 3, 1, 1])

赞(0）回复(0）举报 2023-06-23

olhwl3o22#

将数组与广播中的自身进行比较，并将True值相加：

import numpy as np

a = np.array([1, 1, 4, 10, 5, 3, 5, 5, 8, 9])

c = (a[:,None]==a).sum(axis=0)

print(c) # [2 2 1 1 3 1 3 3 1 1]

赞(0）回复(0）举报 2023-06-23

bis0qfac3#

参见numpy.bincount：
计算每个值在非负整数数组中出现的次数。
具体来说

np.bincount(a)[a]

会产生你想要的结果。bincount生成一个数组，将每个值0...amax(a)Map到这些值的出现次数。你不想知道发生了多少次6或7。相反，您似乎对输入值和出现次数之间的Map感兴趣。这可以通过将bincount的输出与a的原始值进行索引来实现。

赞(0）回复(0）举报 2023-06-23

ct3nt3jp4#

在收到三个工作解决方案后，我决定使用以下代码来检查它们的性能：

import timeit

import numpy as np

TRIALS = 10**6

a = np.array([1, 1, 4, 10, 5, 3, 5, 5, 8, 9])

def np_bincount_version(a):
    b = np.bincount(a)
    return np.array([b[item] for item in a])

def np_unique_version(a):
    _, idx, c = np.unique(a, return_inverse=True, return_counts=True)
    return c[idx]

def np_self_compare_version(a):
    return (a[:,None]==a).sum(axis=0)

assert (np_bincount_version(a) == np_unique_version(a)).all()
assert (np_unique_version(a)   == np_self_compare_version(a)).all()

np_bincount_version_time = timeit.timeit(stmt='np_bincount_version(a)', number=TRIALS, globals=globals())
np_unique_version_time = timeit.timeit(stmt='np_unique_version(a)', number=TRIALS, globals=globals())
np_self_compare_version_time = timeit.timeit(stmt='np_self_compare_version(a)', number=TRIALS, globals=globals())

min_time = min([np_bincount_version_time, np_unique_version_time, np_self_compare_version_time])

print('Absolute time:')
print(f'Bincount version: {np_bincount_version_time:.2f} seconds per {TRIALS} runs.')
print(f'Numpy unique version: {np_unique_version_time:.2f} seconds per {TRIALS} runs.')
print(f'Self compare version: {np_self_compare_version_time:.2f} seconds per {TRIALS} runs.')
print()
print('Normalized time:')
print(f'Bincount version: {np_bincount_version_time / min_time:.2f}x')
print(f'Numpy unique version: {np_unique_version_time / min_time:.2f}x')
print(f'Self compare version: {np_self_compare_version_time / min_time:.2f}x')

结果如下：

Absolute time:
Bincount version: 4.94 seconds per 1000000 runs.
Numpy unique version: 21.02 seconds per 1000000 runs.
Self compare version: 3.74 seconds per 1000000 runs.

Normalized time:
Bincount version: 1.32x
Numpy unique version: 5.61x
Self compare version: 1.00x

（请注意，我假设对于不同大小的阵列，性能特征大致相同）。*

基于元素相互比较的解决方案似乎比np.bincount版本的性能稍好，尽管我在后者中使用列表理解来构建列表，这可能不是最佳解决方案（请随意发布numpy惯用版本！）。与此同时，np.unique版本显然远远落后于竞争对手。

赞(0）回复(0）举报 2023-06-23

kfgdxczn5#

import numpy as np

# Example array
arr = np.array([1, 1, 4, 10, 5, 3, 5, 5, 8, 9])

# Find unique elements and their counts
unique_elements, counts = np.unique(arr, return_counts=True)

# Print the results
for element, count in zip(unique_elements, counts):
    print(f"Element {element} occurs {count} times")

赞(0）回复(0）举报 2023-06-23

我来回答

如何找到numpy数组中每个元素的出现次数？

5条答案

相关问题

热门标签

最新问答