numpy cmds算法无法处理距离矩阵中的无穷大(inf)

qpgpyjmq  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(98)

如果输入类型是“距离”,那么可以有一种情况,即矩阵X可以包含inf,它基本上表示图形的断开点,这是一种完全有效的情况,但“B = -0.5 * H @ D**2@H”在这种情况下抛出错误。

def cmds(X, n_dim, input_type='raw'):
    """
    Classical(linear) multidimensional scaling (MDS)
    
    Parameters
    ----------
    X: (d, n) array or (n,n) array
        input data. The data are placed in column-major order. 
        That is, samples are placed in the matrix (X) as column vectors
        d: dimension of points
        n: number of points
        
    n_dim: dimension of target space
    
    input_type: it indicates whether data are raw or distance
        - raw: raw data. (n,d) array. 
        - distance: precomputed distances between the data. (n,n) array.
    Returns
    -------
    Y: (n_dim, n) array. projected embeddings.
    evals: (n_dim) eigen values
    evecs: corresponding eigen vectors in column vectors
    """

    if input_type == 'distance':
        D = X
    elif input_type == 'raw':
        Xt = X.T
        D = euclidean_distances(Xt,Xt)
        
    # Centering matrix
    H = np.eye(D.shape[0]) - np.ones(D.shape) / D.shape[0]

    # Double-center the distance matrix
    B = -0.5 * H @ D**2 @ H

    # Eigen decomposition
    evals, evecs = np.linalg.eigh(B)

    # Sorting eigenvalues and eigenvectors in decreasing order
    sort_indices = np.argsort(evals)[::-1]
    evals = evals[sort_indices]
    evecs = evecs[:, sort_indices]

    # Selecting top n_dim eigenvectors
    evecs = evecs[:, :n_dim]

    # Projecting data to the new space
    Y = np.sqrt(np.diag(evals[:n_dim])) @ evecs.T

    return Y, evals, evecs

字符串
由于这是一个完全有效的情况下,如果两个点不连接,它们之间的距离将被设置为inf。但代码将失败时,计算中心矩阵和特征值。请帮助我在这里处理这种情况

z8dt9xmd

z8dt9xmd1#

下面是修改后的代码

def cmds(X, n_dim, input_type='raw'):
"""
Classical(linear) multidimensional scaling (MDS)

Parameters
----------
X: (d, n) array or (n,n) array
    input data. The data are placed in column-major order. 
    That is, samples are placed in the matrix (X) as column vectors
    d: dimension of points
    n: number of points
    
n_dim: dimension of target space

input_type: it indicates whether data are raw or distance
    - raw: raw data. (n,d) array. 
    - distance: precomputed distances between the data. (n,n) array.
Returns
-------
Y: (n_dim, n) array. projected embeddings.
evals: (n_dim) eigen values
evecs: corresponding eigen vectors in column vectors
"""

if input_type == 'distance':
    D = X
elif input_type == 'raw':
    Xt = X.T
    D = euclidean_distances(Xt, Xt)

# Check for inf values in the distance matrix
if np.any(np.isinf(D)):
    # Replace inf values with a large but finite value
    D[np.isinf(D)] = np.finfo(D.dtype).max

# Centering matrix
H = np.eye(D.shape[0]) - np.ones(D.shape) / D.shape[0]

# Double-center the distance matrix
B = -0.5 * H @ D**2 @ H

# Eigen decomposition
evals, evecs = np.linalg.eigh(B)

# Sorting eigenvalues and eigenvectors in decreasing order
sort_indices = np.argsort(evals)[::-1]
evals = evals[sort_indices]
evecs = evecs[:, sort_indices]

# Selecting top n_dim eigenvectors
evecs = evecs[:, :n_dim]

# Projecting data to the new space
Y = np.sqrt(np.diag(evals[:n_dim])) @ evecs.T

return Y, evals, evecs

字符串
修改后的代码检查距离矩阵中的inf值,并在计算中心矩阵和特征分解之前将其替换为一个大但有限的值。这确保了代码在处理图中的断开点时不会失败。

相关问题