scipy 通过产生互相关矩阵来测试多个数据集的相似性

r3i60tvu 于 2023-04-06 发布在其他

关注(0)|答案(2)|浏览(120)

我试图比较几个数据集，并基本上测试，如果它们显示相同的功能，虽然这个功能可能会被转移，反转或衰减.下面是一个非常简单的例子：

A = np.array([0., 0, 0, 1., 2., 3., 4., 3, 2, 1, 0, 0, 0])
B = np.array([0., 0, 0, 0, 0, 1, 2., 3., 4, 3, 2, 1, 0])
C = np.array([0., 0, 0, 1, 1.5, 2, 1.5, 1, 0, 0, 0, 0, 0])
D = np.array([0., 0, 0, 0, 0, -2, -4, -2, 0, 0, 0, 0, 0])
x = np.arange(0,len(A),1)

我认为最好的方法是将这些信号归一化，得到绝对值（它们的衰减在这个阶段对我来说并不重要，我对位置感兴趣...但我可能错了，所以我也会欢迎对这个概念的思考）并计算它们重叠的区域。我正在跟进this answer-解决方案看起来非常优雅和简单，但我可能实施得不对

def normalize(sig):
    #ns = sig/max(np.abs(sig))
    ns = sig/sum(sig)
    return ns
a = normalize(A)
b = normalize(B)
c = normalize(C)
d = normalize(D)

看起来像这样

但是，当我试图从答案中实现解决方案时，我遇到了问题。

老

for c1,w1 in enumerate([a,b,c,d]):
    for c2,w2 in enumerate([a,b,c,d]):
        w1 = np.abs(w1)
        w2 = np.abs(w2)
        M[c1,c2] = integrate.trapz(min(np.abs(w2).any(),np.abs(w1).any()))
print M

生成TypeError: 'numpy.bool_' object is not iterable或IndexError: list assignment index out of range。但我只包括.any()，因为没有它们，我得到的是ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()。

编辑-新增（感谢@Kody King）

现在的新代码是：

M = np.zeros([4,4])
SH = np.zeros([4,4])
for c1,w1 in enumerate([a,b,c,d]):
    for c2,w2 in enumerate([a,b,c,d]):
        crossCorrelation = np.correlate(w1,w2, 'full')
        bestShift = np.argmax(crossCorrelation)

        # This reverses the effect of the padding.
        actualShift = bestShift - len(w2) + 1
        similarity = crossCorrelation[bestShift]

        M[c1,c2] = similarity
        SH[c1,c2] = actualShift
M = M/M.max()
print M, '\n', SH

并且输出：

[[ 1.          1.          0.95454545  0.63636364]
 [ 1.          1.          0.95454545  0.63636364]
 [ 0.95454545  0.95454545  0.95454545  0.63636364]
 [ 0.63636364  0.63636364  0.63636364  0.54545455]] 
[[ 0. -2.  1.  0.]
 [ 2.  0.  3.  2.]
 [-1. -3.  0. -1.]
 [ 0. -2.  1.  0.]]

移位矩阵现在看起来还可以，但实际的相关矩阵却不行。我真的很困惑，最低的相关值是用于将d与其自身相关的。我现在想实现的是：

编辑-更新

按照建议，我使用了推荐的归一化公式（将信号除以其和），但问题没有解决，只是颠倒了过来。现在d与d的相关性为1，但所有其他信号都不与它们自己相关。
新输出：

[[ 0.45833333  0.45833333  0.5         0.58333333]
 [ 0.45833333  0.45833333  0.5         0.58333333]
 [ 0.5         0.5         0.57142857  0.66666667]
 [ 0.58333333  0.58333333  0.66666667  1.        ]] 
[[ 0. -2.  1.  0.]
 [ 2.  0.  3.  2.]
 [-1. -3.  0. -1.]
 [ 0. -2.  1.  0.]]

1.相关值应该是最高的，用于使信号与其自身相关（即，在主对角线上具有最高值）。
1.为了得到0和1之间的相关值，因此，我会在主对角线上有1，在其他地方有其他数字（0.x）。
我希望M = M/M.max（）可以完成这项工作，但前提是满足条件1，而目前还没有。

scipy

来源：https://stackoverflow.com/questions/42847836/testing-similarity-of-several-datasets-by-producing-a-cross-correlation-matrix

2条答案

按热度按时间

jexiocij1#

正如ssm所说，numpy的correlate函数很好地解决了这个问题。你提到你对位置感兴趣。correlate函数还可以帮助你判断一个序列与另一个序列的距离。

import numpy as np

def compare(a, b):
    # 'full' pads the sequences with 0's so they are correlated
    # with as little as 1 actual element overlapping.
    crossCorrelation = np.correlate(a,b, 'full')
    bestShift = np.argmax(crossCorrelation)

    # This reverses the effect of the padding.
    actualShift = bestShift - len(b) + 1
    similarity = crossCorrelation[bestShift]

    print('Shift: ' + str(actualShift))
    print('Similatiy: ' + str(similarity))
    return {'shift': actualShift, 'similarity': similarity}

print('\nExpected shift: 0')
compare([0,0,1,0,0], [0,0,1,0,0])
print('\nExpected shift: 2')
compare([0,0,1,0,0], [1,0,0,0,0])
print('\nExpected shift: -2')
compare([1,0,0,0,0], [0,0,1,0,0])

编辑：

您需要在关联每个序列之前对其进行归一化，否则较大的序列将与所有其他序列具有非常高的相关性。
互相关的一个性质是：

$\sum CrossCorrelate(f, g) = (\sum f) * (\sum g)$

因此，如果通过将每个序列除以其总和来进行归一化，则相似性将始终介于0和1之间。
我建议你不要取序列的绝对值。这会改变形状，而不仅仅是规模。例如np.abs（[1，-2]）== [1，2]。归一化已经确保序列大部分是正的，并且总和为1。

二次编辑：

我有一个认识。把信号想象成矢量。归一化的矢量总是与它们自己有一个最大的点积。互相关只是在不同的移位处计算的一个点积。如果你把信号归一化为一个矢量（除以sqrt（s dot s）），自相关将总是最大和1。

import numpy as np

def normalize(s):
    magSquared = np.correlate(s, s) # s dot itself
    return s / np.sqrt(magSquared)

a = np.array([0., 0, 0, 1., 2., 3., 4., 3, 2, 1, 0, 0, 0])
b = np.array([0., 0, 0, 0, 0, 1, 2., 3., 4, 3, 2, 1, 0])
c = np.array([0., 0, 0, 1, 1.5, 2, 1.5, 1, 0, 0, 0, 0, 0])
d = np.array([0., 0, 0, 0, 0, -2, -4, -2, 0, 0, 0, 0, 0])

a = normalize(a)
b = normalize(b)
c = normalize(c)
d = normalize(d)

M = np.zeros([4,4])
SH = np.zeros([4,4])
for c1,w1 in enumerate([a,b,c,d]):
    for c2,w2 in enumerate([a,b,c,d]):
        # Taking the absolute value catches signals which are flipped.
        crossCorrelation = np.abs(np.correlate(w1, w2, 'full'))
        bestShift = np.argmax(crossCorrelation)

        # This reverses the effect of the padding.
        actualShift = bestShift - len(w2) + 1
        similarity = crossCorrelation[bestShift]

        M[c1,c2] = similarity
        SH[c1,c2] = actualShift
print(M, '\n', SH)

输出：

[[ 1.          1.          0.97700842  0.86164044]
[ 1.          1.          0.97700842  0.86164044]
[ 0.97700842  0.97700842  1.          0.8819171 ]
[ 0.86164044  0.86164044  0.8819171   1.        ]]
[[ 0. -2.  1.  0.]
[ 2.  0.  3.  2.]
[-1. -3.  0. -1.]
[ 0. -2.  1.  0.]]

赞(0）回复(0）举报 2023-04-06

ndasle7k2#

您希望使用向量之间的互相关：

例如：

>>> np.correlate(A,B)
array([ 31.])

>>> np.correlate(A,C)
array([ 19.])

>>> np.correlate(A,D)
array([-28.])

如果你不关心符号，你可以简单地取绝对值…

赞(0）回复(0）举报 2023-04-06

我来回答

scipy 通过产生互相关矩阵来测试多个数据集的相似性

2条答案

相关问题

热门标签

最新问答