scipy 连通球群的检测

kcugc4gi  于 2022-11-10  发布在  其他
关注(0)|答案(2)|浏览(157)

我需要检测哪些球体相互连接。如果我们有:

radii = np.array([2, 1, 1, 2, 2, 0.5])
poss = np.array([[7, 7, 7], [7.5, 8.5, 6], [0, 0, 0], [-1, -2, -1], [1, 1, 1], [2, 1, 3]])

我需要一个布尔数组(shape = (number of groups, number of spheres))或数组/数组列表/索引列表来显示哪些球体是连接的。因此,本例的预期结果必须如下所示:

Boolean_array = np.array([[1, 1, 0, 0, 0, 0], [0, 0, 1, 1, 1, 1]], dtype=bool)

object_array = np.array([[0, 1], [2, 3, 4, 5]])

我尝试用networkx(我不是很熟悉)和IDK找到一个解决方案,如果这个库可以帮助我们找到不同半径的球体。我想,在我的previous code中返回的 ends_ind 在这方面可能会有帮助,我尝试使用它作为:

G = nx.Graph([*ends_ind])
L = [nx.node_connected_component(G, 0)]
for i in range(len(radii)):
    iter = 0
    for j in L:
        if i in j:
            iter += 1
    if iter == 0:
        L.append(nx.node_connected_component(G, i))

这将不起作用。错误:

Traceback (most recent call last):
  File "C:/Users/Ali/Desktop/check_2.py", line 31, in <module>
    L.append(nx.node_connected_component(G, i))
  File "<class 'networkx.utils.decorators.argmap'> compilation 8", line 4, in argmap_node_connected_component_5
  File "C:\Users\Ali\anaconda3\envs\PFC_FiPy\lib\site-packages\networkx\algorithms\components\connected.py", line 185, in node_connected_component
    return _plain_bfs(G, n)
  File "C:\Users\Ali\anaconda3\envs\PFC_FiPy\lib\site-packages\networkx\algorithms\components\connected.py", line 199, in _plain_bfs
    nextlevel.update(G_adj[v])
  File "C:\Users\Ali\anaconda3\envs\PFC_FiPy\lib\site-packages\networkx\classes\coreviews.py", line 82, in __getitem__
    return AtlasView(self._atlas[name])
KeyError: 11

由于使用我以前的代码与其他库将是一个效率低下的代码(如果它可以解决这个问题),我正在寻找任何库,如networkx,或方法,可以做它在一个更有效的方式,如果可能的话。
获得预期结果的最佳方法是什么,特别是对于大量球体(~100000)。

x8diyxa7

x8diyxa71#

这里,你过早地尝试使用networkx。首先,你应该计算每对球体的几何距离。一个有用的技巧是:

xyz_distances = poss.reshape(6, 1, 3) - poss.reshape(1, 6, 3)
distances = np.linalg.norm(xyz_distances, axis=2)

这就得到了一个对称的6x6的球面中心之间的欧氏距离的数组。现在,我们需要比较最大可能的距离。这只是每对球面的半径之和,也是一个6x6的数组,我们可以计算为

maximum_distances = radii.reshape(6, 1) + radii.reshape(1, 6)

现在我们可以比较一下这两者:

>>> connections = distances < maximum_distances
>>> connections
array([[ True,  True, False, False, False, False],
       [ True,  True, False, False, False, False],
       [False, False,  True,  True,  True, False],
       [False, False,  True,  True, False, False],
       [False, False,  True, False,  True,  True],
       [False, False, False, False,  True,  True]])

转换为两个组,正如您所希望的那样-您可以通过

>>> G = nx.Graph(connections)
>>> list(nx.connected_components(G))
[{0, 1}, {2, 3, 4, 5}]

请注意,这整个事情将按N^2的球体数量进行缩放,您可能需要以某种方式对其进行优化(例如,通过scipy.spatial.ckdtree)。

2hh7jdfx

2hh7jdfx2#

在我对18000个球体进行的一次测试中,NumPy的linalg泄漏了内存,但SciPy的cdist内存效率更高,并且工作正常[ ref 1 ]。似乎可以将计算限制在刚好是数组直径上方的三角形内,这在内存使用和时间消耗方面可以更有效。多亏了Dominik answer,我们可以使用Numba加速器作为并行的非Python模式来完成此过程:

import numpy as np
import numba as nb
from scipy.spatial.distance import cdist
import networkx as nx

def distances_linalg(radii, poss):
    xyz_distances = poss.reshape(radii.shape[0], 1, 3) - poss.reshape(1, radii.shape[0], 3)
    return radii.reshape(radii.shape[0], 1) + radii.reshape(1, radii.shape[0]), np.linalg.norm(xyz_distances, axis=2)

def distances_cdist(radii, poss):
    return radii.reshape(radii.shape[0], 1) + radii.reshape(1, radii.shape[0]), cdist(poss, poss)

@nb.njit("(Tuple([float64[:, ::1], float64[:, ::1]]))(float64[::1], float64[:, ::1])", parallel=True)
def distances_numba(radii, poss):
    radii_arr = np.zeros((radii.shape[0], radii.shape[0]), dtype=np.float64)
    poss_arr = np.zeros((poss.shape[0], poss.shape[0]), dtype=np.float64)
    for i in nb.prange(radii.shape[0] - 1):
        for j in range(i+1, radii.shape[0]):
            radii_arr[i, j] = radii[i] + radii[j]
            poss_arr[i, j] = ((poss[i, 0] - poss[j, 0])**2 + (poss[i, 1] - poss[j, 1])**2 + (poss[i, 2] - poss[j, 2])**2)**0.5
    return radii_arr, poss_arr

def connected_spheres(radii, poss, method=distances_numba):
    maximum_distances, distances = method(radii, poss)
    connections = distances < maximum_distances
    G = nx.Graph(connections)
    return list(nx.connected_components(G))

# numba radii                      cdist or linalg radii

# [[0.  3.  3.  4.  4.  2.5]            [[4.  3.  3.  4.  4.  2.5]

# [0.  0.  2.  3.  3.  1.5]             [3.  2.  2.  3.  3.  1.5]

# [0.  0.  0.  3.  3.  1.5]             [3.  2.  2.  3.  3.  1.5]

# [0.  0.  0.  0.  4.  2.5]             [4.  3.  3.  4.  4.  2.5]

# [0.  0.  0.  0.  0.  2.5]             [4.  3.  3.  4.  4.  2.5]

# [0.  0.  0.  0.  0.  0. ]]            [2.5 1.5 1.5 2.5 2.5 1. ]]

# numba poss

# [[ 0.          1.87082869 12.12435565 14.45683229 10.39230485  8.77496439]

# [ 0.          0.         12.82575534 15.21512405 11.11305539  9.77241014]

# [ 0.          0.          0.          2.44948974  1.73205081  3.74165739]

# [ 0.          0.          0.          0.          4.12310563  5.83095189]

# [ 0.          0.          0.          0.          0.          2.23606798]

# [ 0.          0.          0.          0.          0.          0.        ]]

# cdist or linalg poss

# [[ 0.          1.87082869 12.12435565 14.45683229 10.39230485  8.77496439]

# [ 1.87082869  0.         12.82575534 15.21512405 11.11305539  9.77241014]

# [12.12435565 12.82575534  0.          2.44948974  1.73205081  3.74165739]

# [14.45683229 15.21512405  2.44948974  0.          4.12310563  5.83095189]

# [10.39230485 11.11305539  1.73205081  4.12310563  0.          2.23606798]

# [ 8.77496439  9.77241014  3.74165739  5.83095189  2.23606798  0.        ]]

在我对18000个球体的测试中,它至少比cdist快2倍。我认为,与cdist相比,numba将非常有助于避免大型数组的内存泄漏。

解决方案2:

我们可以通过numba在an improved cdist code的基础上编写distances_numba。在这个解决方案中,我试图修改代码,使其仅在数组的上三角形上进行调整:

@nb.njit("float64[:, ::1](float64[:, ::1])", parallel=True)
def dot_triu(poss):
    assert poss.shape[1] == 3
    poss_T = poss.T
    dot = np.zeros((poss.shape[0], poss.shape[0]), dtype=poss.dtype)
    for i in nb.prange(poss.shape[0] - 1):
        for j in range(i + 1, poss.shape[0]):
            dot[i, j] = poss[i, 0] * poss_T[0, j] + poss[i, 1] * poss_T[1, j] + poss[i, 2] * poss_T[2, j]
    return dot

@nb.njit("float64[::1](float64[:, ::1])", parallel=True)
def poss_(poss):
    TMP_A = np.zeros(poss.shape[0], dtype=np.float64)
    for i in nb.prange(poss.shape[0]):
        for j in range(poss.shape[1]):
            TMP_A[i] += poss[i, j]**2
    return TMP_A

@nb.njit("(Tuple([float64[:, ::1], float64[:, ::1]]))(float64[::1], float64[:, ::1])", parallel=True)
def distances_numba(radii, poss):
    poss_arr = dot_triu(poss)
    TMP_A = poss_(poss)

    radii_arr = np.zeros((radii.shape[0], radii.shape[0]), dtype=np.float64)
    for i in nb.prange(poss.shape[0] - 1):
        for j in range(i + 1, poss.shape[0]):
            radii_arr[i, j] = radii[i] + radii[j]
            poss_arr[i, j] = (-2. * poss_arr[i, j] + TMP_A[i] + TMP_A[j])**0.5

    return radii_arr, poss_arr

相关问题