如果输入类型是“距离”,那么可以有一种情况,即矩阵X可以包含inf,它基本上表示图形的断开点,这是一种完全有效的情况,但“B = -0.5 * H @ D**2@H”在这种情况下抛出错误。
def cmds(X, n_dim, input_type='raw'):
"""
Classical(linear) multidimensional scaling (MDS)
Parameters
----------
X: (d, n) array or (n,n) array
input data. The data are placed in column-major order.
That is, samples are placed in the matrix (X) as column vectors
d: dimension of points
n: number of points
n_dim: dimension of target space
input_type: it indicates whether data are raw or distance
- raw: raw data. (n,d) array.
- distance: precomputed distances between the data. (n,n) array.
Returns
-------
Y: (n_dim, n) array. projected embeddings.
evals: (n_dim) eigen values
evecs: corresponding eigen vectors in column vectors
"""
if input_type == 'distance':
D = X
elif input_type == 'raw':
Xt = X.T
D = euclidean_distances(Xt,Xt)
# Centering matrix
H = np.eye(D.shape[0]) - np.ones(D.shape) / D.shape[0]
# Double-center the distance matrix
B = -0.5 * H @ D**2 @ H
# Eigen decomposition
evals, evecs = np.linalg.eigh(B)
# Sorting eigenvalues and eigenvectors in decreasing order
sort_indices = np.argsort(evals)[::-1]
evals = evals[sort_indices]
evecs = evecs[:, sort_indices]
# Selecting top n_dim eigenvectors
evecs = evecs[:, :n_dim]
# Projecting data to the new space
Y = np.sqrt(np.diag(evals[:n_dim])) @ evecs.T
return Y, evals, evecs
字符串
由于这是一个完全有效的情况下,如果两个点不连接,它们之间的距离将被设置为inf。但代码将失败时,计算中心矩阵和特征值。请帮助我在这里处理这种情况
1条答案
按热度按时间z8dt9xmd1#
下面是修改后的代码
字符串
修改后的代码检查距离矩阵中的inf值,并在计算中心矩阵和特征分解之前将其替换为一个大但有限的值。这确保了代码在处理图中的断开点时不会失败。