如何在Python Scipy稀疏COO(坐标格式)矩阵中保留零值？

mmvthczy 于 2022-12-28 发布在 Python

关注(0)|答案(1)|浏览(123)

我创建了一个COO矩阵，在数据数组中有零值。当我查询新的COO矩阵数据数组时，我可以在数组中看到那些零值。但是，我不能得到那些零值的索引。我使用nonzero（）方法来检索索引，而那些零值的索引丢失了。有人知道如何得到那些零值的索引吗？如果不知道，这是COO代码中的一个bug吗？
下面是重现这个问题的示例代码。最后一个Assert是false，因为值的数量是7，但只有6个非零索引。我知道非零显然不包括我的零值，但有没有办法使用另一个类似的方法来获得显式零值？

sparse_simple = sp.coo_matrix(
    [
        [1.1, 0, 1.1],
        [0, 1.1, 4.1],
        [1.1, 4.1, 1.1]
    ]
)

sparse_simple_data = sparse_simple.data
sparse_simple_nz = sparse_simple.nonzero()
sparse_simple_data[1] = 0
(n_rows, n_cols) = sparse_simple.shape
sparse_simple_with_explicit_close_to_zero = sp.coo_matrix(
    (sparse_simple_data, (sparse_simple_nz[0], sparse_simple_nz[1])),
    shape=(n_rows, n_cols)
)
num_explicit_vals = len(sparse_simple_with_explicit_close_to_zero.data)
nz_idcs = sparse_simple_with_explicit_close_to_zero.nonzero()
num_nzs = len(nz_idcs[0])

assert num_explicit_vals == num_nzs

在Scipy稀疏数组的文档中，我试图找到另一种方法来提取值的索引，包括非零值，但没有找到任何东西。
我有一个解决这个问题的方法，但是有点麻烦，我只是简单地给数据数组中的所有n值加上一个小数字，然后这个方法就可以工作了。
通过将此添加到创建COO矩阵的上面一行，这将标识“零”值，现在它是一个非常小的值。我用此修复了我的代码，但我不喜欢它。

sparse_simple.data += 0.1e-09

python

来源：https://stackoverflow.com/questions/74929803/how-to-retain-expicit-zero-values-in-a-python-scipy-sparse-coo-coordinate-forma

1条答案

按热度按时间

jq6vz3qz1#

In [1]: import numpy as np
In [2]: from scipy import sparse

您的样品基质：

In [3]: sparse_simple = sparse.coo_matrix(
   ...:     [
   ...:         [1.1, 0, 1.1],
   ...:         [0, 1.1, 4.1],
   ...:         [1.1, 4.1, 1.1]
   ...:     ]
   ...: )

In [4]: sparse_simple
Out[4]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 7 stored elements in COOrdinate format>

您已经修改了data属性;以下是其他的：

In [5]: sparse_simple.data, sparse_simple.row, sparse_simple.col
Out[5]: 
(array([1.1, 1.1, 1.1, 4.1, 1.1, 4.1, 1.1]),
 array([0, 0, 1, 1, 2, 2, 2], dtype=int32),
 array([0, 2, 1, 2, 0, 1, 2], dtype=int32))

正在添加“显式”0;不改变矩阵的“稀疏度”：

In [6]: sparse_simple.data[1] = 0; sparse_simple
Out[6]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 7 stored elements in COOrdinate format>

In [7]: sparse_simple.A
Out[7]: 
array([[1.1, 0. , 0. ],
       [0. , 1.1, 4.1],
       [1.1, 4.1, 1.1]])

但是nonzero，正如它的名字一样，并不包含这个显式的0：

In [8]: sparse_simple.nonzero()
Out[8]: 
(array([0, 1, 1, 2, 2, 2], dtype=int32),
 array([0, 1, 2, 0, 1, 2], dtype=int32))

如果我们看一下代码，就会明白为什么：

In [9]: sparse_simple.nonzero??
Signature: sparse_simple.nonzero()
Source:   
    def nonzero(self):
        """nonzero indices

        Returns a tuple of arrays (row,col) containing the indices
        of the non-zero elements of the matrix.
        """

        # convert to COOrdinate format
        A = self.tocoo()
        nz_mask = A.data != 0
        return (A.row[nz_mask], A.col[nz_mask])

它从“原始”的coo属性开始，但是去掉了所有“显式”的0--所以我们只得到非零值，而不是非零值加上“显式”的0。
稀疏矩阵也有一个就地方法来“清除”显式0：

In [24]: sparse_simple.eliminate_zeros??
Signature: sparse_simple.eliminate_zeros()
Source:   
    def eliminate_zeros(self):
        """Remove zero entries from the matrix

        This is an *in place* operation
        """
        mask = self.data != 0
        self.data = self.data[mask]
        self.row = self.row[mask]
        self.col = self.col[mask]

我看到csr更多地使用这种方法，改变这种格式的稀疏性代价相对较高，因此创建显式0的操作不会在它们自己之后“清除”;我们可以以后再谈。
注意coo不能被索引，例如sparse_simple[0,1]返回错误。csr可以。
因此，虽然可以创建带有显式0的矩阵，但它们在某种程度上被视为例外。

赞(0）回复(0）举报 2022-12-28

我来回答

如何在Python Scipy稀疏COO(坐标格式)矩阵中保留零值？

1条答案

相关问题

热门标签

最新问答