numpy 为什么我的切片数组变体在Python中比元素操作慢?

lnlaulya  于 9个月前  发布在  Python
关注(0)|答案(2)|浏览(104)

我指的是this question,它已经有了一个很好的答案;但是有不必要的操作被识别出来(参见帖子中的讨论),我只是好奇我是否能成功地消除它们。
在此期间,我发现了一种方法,它避免了不必要的乘法(使用掩码索引),并给出了相同的结果。代码如下。
变量1是原始变量。
在变体2中,我尝试使用python切片结合掩码-不仅以更好和更紧凑的方式编写两个循环,而且主要是希望它会变得更快。但结果是,它甚至慢了30%。老实说,原始代码更具可读性,但我希望与双循环相比得到显着改进。
为什么不是这样呢?
或者反过来问:在哪些情况下切片操作比元素操作更快?它们只是语法糖,具有显着的内部开销吗?我认为它们是在C/C++中实现的,并且必须比Python中的i,j手动循环更快。
输出:

D:\python\animation>python test.py
used time for variant 1: 1.0377624034881592
used time for variant 2: 1.30381441116333

D:\python\animation>python test.py
used time for variant 1: 0.8954949378967285
used time for variant 2: 1.251044750213623

D:\python\animation>python test.py
used time for variant 1: 0.9750621318817139
used time for variant 2: 1.3896379470825195

字符串
代码:

import numpy as np
import numpy.ma as ma
import time

def test():

 
    f = np.array([
        [0,   0,   0,   0,   0,   0,      0], 
        [0,   1,   3,   6 ,  4,   2,      0], 
        [0,   2,   4,   7 ,  6,   4,      0],    
        [0,   0,   0,   0,   0,   0,      0]
        ])
        

    u = np.array([
        [0,   0,    0,    0,   0,   0,     0], 
        [0,   0.5,  1,    0,  -1,  -0.5,   0], 
        [0,   0.7,  1.1,  0,  -1,  -0.4,   0], 
        [0,   0,    0,    0,   0,   0,     0], 
        ])
        
    
    # calculate : variant 1
    x = np.zeros_like(f)
    
    maxcount = 100000
    
    start = time.time()

    for count in range(maxcount):
        for i in range(1,u.shape[0]-1):
            for j in range(1,u.shape[1]-1):
                if u[i,j] > 0: 
                    x[i,j] = u[i,j]*(f[i,j]-f[i,j-1])
                else:
                    x[i,j] = u[i,j]*(f[i,j+1]-f[i,j])
                
    end = time.time()
    print("used time for variant 1:", end-start)
                
                           
    
    # calculate : variant 2

    y = np.zeros_like(f)  

    
    start = time.time()
    
    for count in range(maxcount):
        maskl = (u[1:-1, 1:-1] > 0)
        maskr = ~maskl 
        diff = f[1:-1, 1:]  - f[1:-1, 0:-1]
        
        (y[1:-1, 1:-1])[maskl]  = (u[1:-1, 1:-1 ])[maskl]  * (diff[:, :-1])[maskl]
        (y[1:-1, 1:-1])[maskr]  = (u[1:-1, 1:-1 ])[maskr]  * (diff[:, 1: ])[maskr]
    
    end = time.time()
    print("used time for variant 2:", end-start)
    
    np.testing.assert_array_equal(x, y)

test()


“预取”u和y的切片使它更好一点,但不是很明显:

for count in range(maxcount):
        maskl = (u[1:-1, 1:-1] > 0)
        maskr = ~maskl 
        diff = f[1:-1, 1:]  - f[1:-1, 0:-1]
        
        yy = (y[1:-1, 1:-1])   # <<-- 
        uu = (u[1:-1, 1:-1 ])  # <<--
        
        yy[maskl]  = uu[maskl]  * (diff[:, :-1])[maskl]
        yy[maskr]  = uu[maskr]  * (diff[:, 1: ])[maskr]

sbdsn5lh

sbdsn5lh1#

你可以很容易地用numba来加速这个过程。另外,正如注解中所说的,这取决于你的输入数组有多大-数组越大,第二个变体就越快。
这里是快速基准:

import perfplot

import numpy as np
from numba import njit

def variant_1(u, f):
    x = np.zeros_like(f)

    for i in range(1, u.shape[0] - 1):
        for j in range(1, u.shape[1] - 1):
            if u[i, j] > 0:
                x[i, j] = u[i, j] * (f[i, j] - f[i, j - 1])
            else:
                x[i, j] = u[i, j] * (f[i, j + 1] - f[i, j])

    return x

def variant_2(u, f):
    y = np.zeros_like(f)

    maskl = u[1:-1, 1:-1] > 0
    maskr = ~maskl
    diff = f[1:-1, 1:] - f[1:-1, 0:-1]

    (y[1:-1, 1:-1])[maskl] = (u[1:-1, 1:-1])[maskl] * (diff[:, :-1])[maskl]
    (y[1:-1, 1:-1])[maskr] = (u[1:-1, 1:-1])[maskr] * (diff[:, 1:])[maskr]

    return y

@njit
def variant_numba(u, f):
    x = np.zeros_like(f)

    for i in range(1, u.shape[0] - 1):
        for j in range(1, u.shape[1] - 1):
            if u[i, j] > 0:
                x[i, j] = u[i, j] * (f[i, j] - f[i, j - 1])
            else:
                x[i, j] = u[i, j] * (f[i, j + 1] - f[i, j])

    return x

f = np.array(
    [
        [0, 0, 0, 0, 0, 0, 0],
        [0, 1, 3, 6, 4, 2, 0],
        [0, 2, 4, 7, 6, 4, 0],
        [0, 0, 0, 0, 0, 0, 0],
    ]
)

u = np.array(
    [
        [0, 0, 0, 0, 0, 0, 0],
        [0, 0.5, 1, 0, -1, -0.5, 0],
        [0, 0.7, 1.1, 0, -1, -0.4, 0],
        [0, 0, 0, 0, 0, 0, 0],
    ]
)

x1 = variant_1(u, f)
x2 = variant_2(u, f)
x3 = variant_numba(u, f)

assert np.allclose(x1, x2)
assert np.allclose(x1, x3)

def setup_u_f(n):
    return np.tile(u, (n, n)), np.tile(f, (n, n))

perfplot.show(
    setup=setup_u_f,
    kernels=[
        lambda u, f: variant_1(u, f),
        lambda u, f: variant_2(u, f),
        lambda u, f: variant_numba(u, f),
    ],
    labels=["variant_1", "variant_2", "variant_numba"],
    n_range=[1, 2, 5, 10, 20, 50, 100],
    xlabel="np.tile(_, (n, n))",
    logx=True,
    logy=True,
)

字符串
创建此图表:
x1c 0d1x的数据

wpcxdonn

wpcxdonn2#

我得到的答案和你不太一样,可能是因为我使用的是浮点数组而不是整数数组(或者我的程序中有一个错误),但你可能会发现这样的东西更简单:

temp = np.zeros_like(f, )
        temp[:,1:] = f[:,:-1]  # temp[a, b] = f[a, b - 1]
        x1 = u * (f - temp)
        temp[:,:-1] = f[:,1:]  # temp[a, b] = f[a, b + 1]
        x2 = u * (temp - f)
        result = np.where(u > 0, x1, x2)

字符串
我认为这是一个有点清楚你的意图,并没有涉及大量的掩盖。

相关问题