numpy 在for循环中创建切片

hmae6n7t  于 2023-08-05  发布在  其他
关注(0)|答案(2)|浏览(128)

我是新手。我对它的基本理解是,你希望通过对数组应用操作来实现效率,因为这将for循环移动到c代码中。我试图使下面的calc函数更高效,因此通过使用切片和x1-x2来应用这个原理。这删除了我最初在代码中的内部for循环,并大大提高了性能。我还能做些什么来提高函数的效率吗?或者我需要用C来实现它吗?

import math
import numpy as np
import time

def calc(x, i, K, N):
    r = np.empty(K)
    r[0] = 0
    for k in range(1, K):
        o = math.floor((k + N) / 2)
        x1 = x[i-o:i-o+N]
        x2 = x[i-o+k:i-o+N+k]
        s = np.square(x1-x2)
        r[k] = np.sum(s)/len(s)
    return r

input = np.arange(8, 10, 0.002) * np.sin(np.arange(0, 100, 0.1) * np.pi)

start_time = time.time()
output1 = calc(input, 500, 64, 448)
print(time.time()-start_time)

字符串
输出:
0.00018095970153808594
这是我的第一次尝试:

def calc(x, i, K, N):   
    r = np.zeros(K)
    s = np.zeros(N)
    for k in range(1, K):
        o = math.floor((k + N) / 2)
        for n in range(N):
            s[n] = x[n - o + i] - x[n - o + i + k]
        s = np.square(s)
        r[k] = np.sum(s) / len(s)
    return r


输出:
0.0051839351654052734

kdfy810k

kdfy810k1#

我找不到一种方法来加速你的calc函数,但是如果你以下面的方式导入你的库,所有的代码都会运行得更快:

from math import floor
from numpy import empty, square, sum, arange, sin, pi
import time

def calc(x, i, K, N):
    r = empty(K)
    r[0] = 0
    for k in range(1, K):
        o = floor((k + N) / 2)
        x1 = x[i-o:i-o+N]
        x2 = x[i-o+k:i-o+N+k]
        s = square(x1-x2)
        r[k] = sum(s)/len(s)
    return r

input = arange(8, 10, 0.002) * sin(arange(0, 100, 0.1) * pi)

start_time = time.time()
output1 = calc(input, 500, 64, 448)
print(time.time()-start_time)

字符串
但也许这不是你想要的。

gdrx4gfi

gdrx4gfi2#

我能够通过基于x的滑动窗口视图创建x1x2的数组来对代码进行向量化,使用np.lib.stride_tricks.sliding_window_view进行计算。但是请记住,在速度和可读性之间总是有一个权衡。

import math
import numpy as np

def calc(x, i, K, N):
    r = np.empty(K)
    r[0] = 0
    for k in range(1, K):
        o = math.floor((k + N) / 2)
        x1 = x[i-o:i-o+N]
        x2 = x[i-o+k:i-o+N+k]
        s = np.square(x1-x2)
        r[k] = np.sum(s)/len(s)
    return r

def calc2(x, i, K, N):
    r = np.zeros(K)
    k = np.arange(1, K)
    o = (k + N)//2
    a = np.lib.stride_tricks.sliding_window_view(x, N)
    x1 = a[i-o]
    x2 = a[i-o+k]
    s = (x1 - x2)**2
    r[1:] = np.sum(s, axis=1)/N
    return r

arr_in = np.arange(8, 10, 0.002) * np.sin(np.arange(0, 100, 0.1) * np.pi)

字符串
比较:

%timeit calc(arr_in, 500, 64, 448)
466 µs ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit calc2(arr_in, 500, 64, 448)
102 µs ± 2.08 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


如果你要多次调用你的函数,使用numba可能是一个不错的选择。你可以将它应用到你原来的双循环中,它甚至比我上面的矢量化版本更快。最快的似乎是使用单for循环版本。

@njit
def calc3(x, i, K, N):   
    r = np.zeros(K)
    s = np.zeros(N)
    for k in range(1, K):
        o = (k+N)//2
        for n in range(N):
            s[n] = x[n - o + i] - x[n - o + i + k]
        s = np.square(s)
        r[k] = np.sum(s)/N
    return r

@njit
def calc4(x, i, K, N):
    r = np.zeros(K)
    for k in range(1, K):
        o = (k + N)//2
        x1 = x[i-o:i-o+N]
        x2 = x[i-o+k:i-o+N+k]
        s = np.square(x1-x2)
        r[k] = np.sum(s)/N
    return r

calc3(arr_in, 500, 64, 448)
calc4(arr_in, 500, 64, 448)


时间安排:

%timeit calc3(arr_in, 500, 64, 448)
69.1 µs ± 466 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit calc4(arr_in, 500, 64, 448)
38.9 µs ± 54.3 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

相关问题