scipy 在python中高效计算(稀疏)矩阵乘积的迹的最佳方法是什么?

qzwqbdag  于 2022-11-09  发布在  Python
关注(0)|答案(1)|浏览(173)

我想求两个矩阵的希尔伯特施密特内积。
对于两个矩阵A,B,这个运算需要A乘以B的厄米共轭的矩阵积,然后是迹(沿对角线求和)。由于只需要对角线元素,因此不需要非对角线项,因此求出完整的矩阵积是毫无意义的。
实际上,需要计算:
𝑇𝑟(𝐴†𝐵)=∑_{𝑖𝑗}(𝐴_𝑖𝑗*𝐵_𝑖𝑗)
其中i索引行,j索引列。
对于稀疏矩阵,最快的方法是什么?我找到了下面类似的文章:
What is the best way to compute the trace of a matrix product in numpy?
目前,我正在:

def hilbert_schmidt_inner_product(mat1, mat2):

    ## find nonzero ij indices of each matrix
    mat1_ij = set([tuple(x) for x in np.array(list(zip(*mat1.nonzero())))])
    mat2_ij = set([tuple(x) for x in np.array(list(zip(*mat2.nonzero())))])

    ## find common ij indices between these that are both nonzero
    common_ij = np.array(list(mat1_ij & mat2_ij))

    ## select common i,j indices from both (now will be 1D array)
    mat1_survied = np.array(mat1[common_ij[:,0], common_ij[:,1]])[0]
    mat2_survied = np.array(mat2[common_ij[:,0], common_ij[:,1]])[0]

    ## multiply the nonzero ij common elements (1D dot product!)
    trace = np.dot(mat1_survied.conj(),mat2_survied)
    return trace

但是,此速度比以下速度慢:

import numpy as np
sum((mat1.conj().T@mat2).diagonal())

它在求迹之前进行全矩阵乘积,因此做了无意义的运算来寻找非对角元素。有没有更好的方法来做这个?
我使用以下内容进行基准测试:

import numpy as np
from scipy.sparse import rand

Dimension = 2**12

A = rand(Dimension, Dimension, density=0.001, format='csr')
B = rand(Dimension, Dimension, density=0.001, format='csr')

做了几个测试,我发现:

%timeit hilbert_schmidt_inner_product(A,B)
49.2 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum((A.conj().T@B).diagonal())
1.48 ms ± 32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.einsum('ij,ij->', A.conj().todense(), B.todense())
53.9 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
sqxo8psd

sqxo8psd1#

另一个选项是(A.conj().multiply(B)).sum()

In [111]: Dimension = 2**12

In [112]: A = rand(Dimension, Dimension, density=0.001, format='csr')
     ...: B = rand(Dimension, Dimension, density=0.001, format='csr')

sum((A.conj().T @ B).diagonal())相比:

In [113]: sum((A.conj().T @ B).diagonal())
Out[113]: 4.152218112255467

In [114]: (A.conj().multiply(B)).sum()
Out[114]: 4.152218112255466

In [115]: %timeit sum((A.conj().T @ B).diagonal())
2.7 ms ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [116]: %timeit (A.conj().multiply(B)).sum()
1.12 ms ± 4.39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

当然,对于较大的Dimension值,相对性能差异要大得多(O(Dimension**3)用于全矩阵乘法,O(Dimension**2)用于元素乘法):

In [119]: Dimension = 2**14

In [120]: A = rand(Dimension, Dimension, density=0.001, format='csr')
     ...: B = rand(Dimension, Dimension, density=0.001, format='csr')

In [121]: sum((A.conj().T @ B).diagonal())
Out[121]: 69.23254213582365

In [122]: (A.conj().multiply(B)).sum()
Out[122]: 69.23254213582364

In [123]: %timeit sum((A.conj().T @ B).diagonal())
124 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [124]: %timeit (A.conj().multiply(B)).sum()
8.67 ms ± 63.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

相关问题