加速多个数组之间的并行Numpy进程

jljoyd4f 于 2023-03-30 发布在其他

关注(0)|答案(1)|浏览(126)

我已经构造了一个脚本，在一个循环中找到当前索引处最近的8个索引;本质上是一个移动窗口算法。并行组件需要在多个2-D数组中执行此操作，这些数组的总量可能会有所不同，但总是具有相同的维度（例如2800 i x 1200 j）。我已经用12测试了这个脚本，所有数组的数据类型都是Float 32，最大十进制精度为8。
首先，让我们从脚本的最近邻居部分开始，如下所示：

import numpy as np
import multiprocessing as mpr

def get_neighbors(arr, origin, num_neighbors = 8):
    coords = np.array([[i,j] for (i,j),value in np.ndenumerate(arr)]).reshape(arr.shape + (2,))
    distances = np.linalg.norm(coords - origin, axis = -1)
    neighbor_limit = np.sort(distances.ravel())[num_neighbors]

    window = np.where(distances <= neighbor_limit)
    exclude_window = np.where(distances > neighbor_limit)
    
    return window, exclude_window, distances

我已经构建了一个静态数组（名为gridranger），它处理移动窗口索引循环，并用于为所有其他二维数组提供窗口索引坐标，前提是所有其他二维数组的大小都相同。此脚本的目标是将所有数组中移动窗口中索引处的所有值提取到一个列表中，然后执行一些分析。这部分如下所示：注意，变量grids包含Map到每个对应的2-D数组的所有变量的名称：

def extractor(queue, gridin, windowin):
    extract_values = []
    for i in range(0, len(windowin[0])):
        extract_values.append(gridin[windowin[0][i], windowin[1][i]])
    queue.put(extract_values)

def parallel():
    for index, val in np.ndenumerate(gridranger):
        window, exclude, distances = get_neighbors(gridranger, [index[0], index[1]])
        outarr = np.column_stack((window[0], window[1]))
        outvalues, processes = [], []
        q = mpr.Queue()
        for grid in grids:
            pro = mpr.Process(target=extractor, args=(q, grid, window))
            processes.extend([pro])
            pro.start()
        for p in processes:
            extract_values = q.get()
            outvalues.append(extract_values)
        for p in processes:
            p.join()
#       return outvalues
        print(index, outvalues)

我遇到的问题是使用Multiprocess运行此操作的时间长度，平均约为7.5-8.5秒。对于像我正在运行此移动窗口的大型2-D数组，这显然是完全低效的。我可以采取哪些步骤来大幅减少此运行时间？

numpy

来源：https://stackoverflow.com/questions/75868166/speed-up-numpy-process-across-multiple-arrays-in-parallel

1条答案

按热度按时间

hs1rzwqc1#

我不能谈论并行化，但你的get_neighbors函数做了大量不必要的工作。每次你创建一个新的坐标数组，对从原点到数组的距离进行完整排序，并为include和exclude索引运行两次数组。下面的函数仍然重复一些工作---调用ogrid，例如，可以被移到函数之外---但是它会使运行时降低大约100倍

def get_neighbors_vectorized(arr, origin, num_neighbors = 8):
    n, m = arr.shape 
    # make the coordinates as  a (1,n) shaped array and an (m,) shaped array
    xcoord, ycoord = np.ogrid[0:n, 0:m]
    # the squared distances from the origin are broadcast together by the +
    distances = np.sqrt((xcoord - origin[0])**2 + (ycoord - origin[1])**2)
    # argpartition returns an array where arr[1...k] are the indices of the first k sorted values. This replaces the sort and where calls in your original function
    partition = np.argpartition(distances, num_neighbors, axis=None)
    # convert from the raveled array indices back to 2d
    window = (partition[:num_neighbors] // m, partition[:num_neighbors] % m)
    # only the window itself seems to be used in the subsequent functions
    return window

ipython 3中的基准测试

In [2]: arr = np.arange(20000).reshape(200, 100)

In [3]: %timeit get_neighbors(arr, [12, 4])
7.36 ms ± 7.14 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit get_neighbors_vectorized(arr, [12, 4])
89.2 µs ± 64.1 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

我不知道这些例程的最终目的是什么，但看起来你只是在做某种类型的移动窗口，这意味着你可能会完全取代get_neighbors。

赞(0）回复(0）举报 2023-03-30

我来回答

加速多个数组之间的并行Numpy进程

1条答案

相关问题

热门标签

最新问答