Python稀疏矩阵C，其元素c_ij = sum_j min(a_ij，B_ji)来自稀疏矩阵A和B

我有两个稀疏矩阵A和B，其中A和B的列数相同（但行数可能不同），我试图得到一个稀疏矩阵C，其元素c_ij = sum_j min（a_ij，b_ji），其中a_ij和b_ij分别是A和B的元素。
这能高效地完成吗？理想情况下是稀疏的吗？即使是密集的也可以，但涉及的矩阵将是巨大的（数万行）并且非常稀疏。

- 详细信息：**类似于标准（数学）矩阵乘法，但不是对元素的乘积求和，而是对元素的最小值求和。

举一个例子，使用标准的矩阵乘法A*B给出元素c_ij = sum_ja_ij * b_ji，其中i和j分别作为行和列索引。
例如，

import numpy as np
from scipy.sparse import csr_matrix

A = csr_matrix([[1, 2, 0], 
                [0, 0, 3], 
                [4, 0, 5]])

B = csr_matrix([[2, 0, 0], 
                [0, 0, 3], 
                [6, 0, 0]])

C = A * B
print(f"\nC = A * B = \n{C}")

给予

>>> 
C = A * B = 
  (0, 2)    6
  (0, 0)    2
  (1, 0)    18
  (2, 0)    38

其中
c_（31）= 38 = j和（a_（3j）* b_（j1））= 4 * 2 + 0 * 0 + 5 * 6 = 38。
对于C中的每个元素，我需要最小值的和min（a_ij * b_ji），而不是乘积a_ij * b_ji的和：
c_31 =和_j最小（a_3j，b_j1）=最小（4，2）+最小（0，0）+最小（5，6）= 7。

在普通的密集数组中，这在三维数组中会更容易完成，但在稀疏形式中不可用。

import numpy as np
from scipy.sparse import csr_array, coo_array

# If you're able to start in COO format instead of CSR format, it will make the code below simpler
A = csr_array([
    [1, 2, 0],
    [0, 0, 3],
    [4, 0, 5],
    [4, 0, 5],
    [1, 1, 6],
])
B = csr_array([
    [2, 0, 0, 8],
    [0, 0, 3, 2],
    [6, 0, 0, 4],
])
m, n = A.shape
n, p = B.shape

# first min argument based on A
a_coo = A.tocoo(copy=False)
a_row_idx = ((a_coo.col + a_coo.row*(n*p))[None, :] + np.arange(0, n*p, n)[:, None]).ravel()
a_min = coo_array((
    np.broadcast_to(A.data, (p, A.size)).ravel(),  # data: row-major repeat
    (
        a_row_idx,                 # row indices
        np.zeros_like(a_row_idx),  # column indices
    ),
), shape=(m*n*p, 2))

# test; deleteme
a_min_dense = np.zeros((m*n*p, 2), dtype=int)
a_min_dense[:, 0] = np.repeat(A.toarray(), p, axis=0).ravel()
assert np.allclose(a_min_dense, a_min.toarray())

# second min argument based on B
b_coo = B.tocoo(copy=False)
b_row_idx = ((b_coo.row + b_coo.col*n)[None, :] + np.arange(0, m*n*p, n*p)[:, None]).ravel()
b_min = coo_array((
    np.broadcast_to(B.data, (m, B.size)).ravel(),  # data: row-major repeat
    (
        b_row_idx,                # row indices
        np.ones_like(b_row_idx),  # column indices
    ),
), shape=(m*n*p, 2))

# test; deleteme
b_min_dense = np.zeros((m*n*p, 2), dtype=int)
b_min_dense[:, 1] = np.tile(B.T.toarray().ravel(), (1, m)).ravel()
assert np.allclose(b_min_dense, b_min.toarray())

# Alternative to this: use two one-dimensional columns and hstack
ab = (a_min + b_min).min(axis=1)
addend = ab.reshape((-1, n))

# scipy sparse summation input is sparse, but output is NOT SPARSE. It's an np.matrix.
# If that matters, you need to run a more complex operation on 'ab' that manipulates indices.
# deleteme
total = addend.sum(axis=1).reshape((m, p))

# Example true sparse sum
separators = np.insert(
    np.nonzero(np.diff(addend.row))[0] + 1,
    0, 0,
)
sparse_total = coo_array((
    np.add.reduceat(addend.data, separators),
    (
        addend.row[separators],
        np.zeros_like(separators),
    )
), shape=(m*p, 1)).reshape((m, p))

# test; deleteme
ab_dense = (a_min_dense + b_min_dense).min(axis=1)
sum_dense = ab_dense.reshape((-1, n)).sum(axis=1).reshape((m, p))
assert np.allclose(ab.toarray().ravel(), ab_dense)
assert np.allclose(sum_dense, total)
assert np.allclose(sum_dense, sparse_total.toarray())

print(sum_dense)

其他注意事项：不要使用_matrix;它已被弃用，取而代之的是_array。
上面的操作是故意对与你的不同的示例矩阵进行的，因为当所有维度的大小都相同时，不可能可靠地开发数组操作代码。但是，当我将它们修改为与你的原始数组相同时，我得到

[[1 0 2]
 [3 0 0]
 [7 0 0]]

Python稀疏矩阵C，其元素c_ij = sum_j min(a_ij，B_ji)来自稀疏矩阵A和B

1条答案

相关问题

热门标签

最新问答