我想利用numba自动并行化与@jit,以帮助我加快运行我的函数,因为我使用非常大的输入,我没有parralisation之前的经验,我尝试了几个解决方案使用多处理,我有很多问题,我张贴它作为问题在堆栈溢出,但几乎没有响应,所以我移动的东西,可以帮助我自动和看起来更简单,
下面是一个实现numba @jit的简单例子:
@jit(nopython=True, parallel=True)
def f(x, y):
return x + y
所以我试着在代码中简单地使用它:
from numba import njit, prange
@njit(parallel=True)
def defineMatrices(C0x, C0y, C0xy, C0yx, Cxx_err, Cyy_err, Cxy_err, Cyx_err, dCx, dCy, dCxy,dCyx):
Nk = len(dCx) # number of free parameters
Nm = len(C0x) # number of measurements
#print('NK:', Nk)
#print('Nm:', Nm)
Ax = np.zeros([Nk, Nk])
Ay = np.zeros([Nk, Nk])
Axy = np.zeros([Nk, Nk])
Ayx = np.zeros([Nk, Nk])
A = np.zeros([4 * Nk, Nk])
##
Bx = np.zeros([Nk, 1])
By = np.zeros([Nk, 1])
Bxy = np.zeros([Nk, 1])
Byx = np.zeros([Nk, 1])
B = np.zeros([4 * Nk, 1])
##
Dx = (Cxx_err[0:Nm, :] - C0x[0:Nm, :]) ### dk ?
Dy = (Cyy_err[0:Nm, :] - C0y[0:Nm, :])
Dxy = (Cxy_err[0:Nm, :] - C0xy[0:Nm, :])
Dyx = (Cyx_err[0:Nm, :] - C0yx[0:Nm, :])
for i in range(Nk): ## i represents each quad
# print('done A:', 100.* i ,'%')
for j in range(Nk):
Ax[i, j] = np.sum(np.dot(dCx[i], dCx[j].T))
Ay[i, j] = np.sum(np.dot(dCy[i], dCy[j].T))
Axy[i, j] = np.sum(np.dot(dCxy[i], dCxy[j].T))
Ayx[i, j] = np.sum(np.dot(dCyx[i], dCyx[j].T))
A[i, :] = Ax[i, :]
A[i + Nk, :] = Ay[i, :]
A[i + 2 * Nk, :] = Axy[i, :]
A[i + 3 * Nk, :] = Ayx[i, :]
##
for i in range(Nk):
Bx[i] = np.sum(np.dot(dCx[i], Dx.T))
By[i] = np.sum(np.dot(dCy[i], Dy.T))
Bxy[i] = np.sum(np.dot(dCxy[i], Dxy.T))
Byx[i] = np.sum(np.dot(dCyx[i], Dyx.T))
B[i] = Bx[i]
B[i + Nk] = By[i]
B[i + 2 * Nk] = Bxy[i]
B[i + 3 * Nk] = Byx[i]
return A, B
我的意见是:
所有C:是两个dim矩阵所有dC:三维矩阵
输出:
A:两个调光B:一维
但我看到了错误:
TypingError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17908/1125053646.py in <module>
1 start_time = time.time()
----> 2 A, B = defineMatrices(C0x, C0y, C0xy, C0yx, Cxx1, Cyy1, Cxy1, Cyx1, dCx, dCy, dCxy,dCyx)
3 end_time = time.time()
4
5 # Calculate the time taken and print it
~\Anaconda3\lib\site-packages\numba\core\dispatcher.py in _compile_for_args(self, *args, **kws)
418 e.patch_message(msg)
419
--> 420 error_rewrite(e, 'typing')
421 except errors.UnsupportedError as e:
422 # Something unsupported is present in the user code, add help info
~\Anaconda3\lib\site-packages\numba\core\dispatcher.py in error_rewrite(e, issue_type)
359 raise e
360 else:
--> 361 raise e.with_traceback(None)
362
363 argtypes = []
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function fill_diagonal at 0x0000019FF09F3940>) found for signature:
>>> fill_diagonal(array(float64, 1d, C), float64)
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function 'np_fill_diagonal': File: numba\np\arraymath.py: Line 2928.
With argument(s): '(array(float64, 1d, C), float64)':
Rejected as the implementation raised a specific error:
TypingError: The first argument must be at least 2-D (found 1-D)
raised from C:\Users\musa\Anaconda3\lib\site-packages\numba\np\arraymath.py:2958
During: resolving callee type: Function(<function fill_diagonal at 0x0000019FF09F3940>)
During: typing of call at C:\Users\musa\AppData\Local\Temp/ipykernel_17908/2593834973.py (30)
File "..\..\..\AppData\Local\Temp\ipykernel_17908\2593834973.py", line 30:
<source missing, REPL/exec in use?>
对于最小可归约示例:
C0x = [[2.969129, 3.065011, 3.294439],
[3.087682, 1.674345, 2.930011],
[3.355765, 2.863887, 2.588477]]
C0y = [[6.748940, 9.673919, 7.662236],
[9.527495, 11.328892, 10.147356],
[7.738452, 10.280770, 7.684719]]
C0xy = [[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]]
C0yx = [[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]]
Cxx1 = [[2.970383, 3.065395, 3.295505],
[3.088111, 1.674282, 2.930299],
[3.356823, 2.864131, 2.589346]]
Cyy1 = [[6.733786, 9.656677, 7.646381],
[9.510379, 11.309615, 10.129516],
[7.722598, 10.262803, 7.668157]]
Cxy1 = [[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]]
Cyx1 = [[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]]
dCx = [array([[-0.88391347, -0.9165879 , -1.00445679],
[-0.9102315 , -0.94390882, -1.03437531],
[-0.99567547, -1.03249243, -1.13146452]]),
array([[-0.93911752, -0.49819029, -0.88115751],
[-0.51945798, -0.2755809 , -0.48739065],
[-0.87413297, -0.46370905, -0.82018765]])]
dCy = [array([[4.54319981, 6.52256593, 5.20680836],
[6.38565572, 9.16779592, 7.31840445],
[5.19995576, 7.46547485, 5.95950073]]),
array([[ 9.36071422, 11.09190807, 9.94111537],
[10.98180048, 13.0128158 , 11.66270877],
[ 9.93855597, 11.7766108 , 10.55478873]])]
dCxy = [array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]),
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])]
dCyx = [array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]),
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])]
A, B = defineMatrices(C0x, C0y, C0xy, C0yx, Cxx1, Cyy1, Cxy1, Cyx1, dCx, dCy, dCxy,dCyx)
输出:
A = [[26.203265, 17.026796],
[17.026796, 11.763451],
[1138.015961, 1913.702747],
[1913.702747, 3238.589801],
[0.0, 0.0],
[0.0, 0.0],
[0.0, 0.0],
[0.0, 0.0]]
B = [-0.016327, -0.011957, -2.966823, -5.028449, 0.0, 0.0, 0.0, 0.0]
2条答案
按热度按时间iszxjhcz1#
在你的例子中,
dCx
是一个2(3,3)数组的列表。让我们做一个这样的列表:双循环的
Ax
部分:我想说的是,你可以用一个“向量化”的操作来代替双循环。
einsum
,matmul
,甚至dot
都可以在更高的维度上工作。你的列表可以被转换成一个3d数组:
下面是一个使用
einsum
的Ax
的等价物:进一步考虑,我也可以使用
np.dot
:matmul
版本有点混乱(它对“批次”维度有不同的规则):其中
einsum
是最快的,并且明显优于双嵌套使用示例
dCx
:dw1jzc5e2#
我已经实现了np.einsum解决方案:我在这里使用的dCx具有形状(40,40,40),A具有形状(160,40)
我用两个for循环比较时间和我的代码
我没想到第二个解决方案所花费的时间会比for循环更长