numpy 我怎么能把这两个稀疏矩阵分开呢?

9rbhqvlz  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(128)

我试图将密集矩阵操作移动到稀疏。我使用numpy广播将形状为(432,)的数组划分为(591,432),当它们是密集的时候,但是我如何对稀疏矩阵做这件事?

<591x432 sparse matrix of type '<class 'numpy.int64'>'
    with 3876 stored elements in Compressed Sparse Column format>

<1x432 sparse matrix of type '<class 'numpy.int64'>'
    with 432 stored elements in COOrdinate format>

字符串
当我尝试使用下面的虚拟数据时...

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

matrix = CountVectorizer().fit_transform(raw_documents=["test sentence.", "test sent 2.").T
max_w = np.max(matrix, axis=0)
matrix / max_w


我得到ValueError: inconsistent shapes。我怎么能把这些分开呢?

myzjeezk

myzjeezk1#

如果你真的想,你可以除以乘以倒数。

import numpy as np
from scipy.sparse import csc_matrix, coo_matrix
A = csc_matrix([[3, 4], [5, 6]])
B = A.max(axis=0)
res = A.multiply(B.power(-1.))
ref = A/B.todense()
np.allclose(res.todense(), ref)  # True

字符串
但在您的情况下,与除以B.todense()相比,可能没有速度优势。

import numpy as np
from scipy.sparse import csc_matrix, coo_matrix
rng = np.random.default_rng(452349345693456)

# generate arrays like yours
shape = (591, 432)
nnz = 3876
A = rng.random(size=shape)
b = np.partition(A.ravel(), nnz)[nnz]
A[A >= b] = 0
A = csc_matrix(A)
assert A.nnz == nnz
B = A.max(axis=0)

# compare solutions
res = A.multiply(B.power(-1.))
ref = A/B.todense()
np.allclose(res.todense(), ref)  # True

%timeit A.multiply(B.power(-1.))
# 1.3 ms ± 734 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit A/B.todense()
# 306 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

相关问题