如何在SQL Server中对附近的纬度和经度进行分组并为该组分配名称/编号？

ufj5ltwl 于 2023-01-08 发布在 SQL Server

关注(0)|答案(1)|浏览(163)

我有一个包含ID、LATITUDE、LONGITUDE、COUNTRY_CD的表，我尝试将ID在40米范围内分组（聚类），并为该组分配名称/编号。例如，下面有7个ID记录在40米范围内，需要分配名称/编号。
我的表中有100 K条记录，记录了世界各地的纬度和经度，并且会有超过100个来自一个国家的聚类，我不知道每个国家会有多少个聚类。
我可以选择附近点Ex的链，ID 1和ID 3都“接近"ID 6（但不是彼此）。

create table #temp
(
    ID varchar(10),
    LATITUDE [decimal](11, 8),
    LONGITUDE [decimal](11, 8),
    COUNTRY_CD [char](2)
)

insert into #temp select 'ID1', 10.81583689, 78.61898689, 'IN'
insert into #temp select 'ID2', 10.81513789, 78.61898789, 'IN'
insert into #temp select 'ID3', 10.81514889, 78.61894889, 'IN'
insert into #temp select 'ID4', 10.81523989, 78.61898989, 'IN'
insert into #temp select 'ID5', 10.81521089, 78.61891089, 'IN'
insert into #temp select 'ID6', 10.81551189, 78.61891189, 'IN'
insert into #temp select 'ID7', 10.81551189, 78.61791189, 'IN'
insert into #temp select 'ID8', 10.81561189, 78.61792189, 'IN'
insert into #temp select 'ID9', 10.81571189, 78.61793189, 'IN'

select                    
    t1.ID, t2.ID,
    t1.LATITUDE, 
    t1.LONGITUDE,
    t1.COUNTRY_CD,
    --calculate the distance in meters
    cast(6378137.0 * sqrt(power((radians(t1.LATITUDE) - radians(t2.LATITUDE)), 2) 
         + power((radians(t1.LONGITUDE) - radians(t2.LONGITUDE)) * cos(radians(t1.LATITUDE)), 2)) as integer) as MAPPING_DISTANCE,
    (row_number() over (partition by t1.ID order by
              --rank the distance in meters
              cast(6378137.0*sqrt(power((radians(t1.LATITUDE)-radians(t2.LATITUDE)),2) 
              + power((radians(t1.LONGITUDE)-radians(t2.LONGITUDE))*cos(radians(t1.LATITUDE)),2)) as integer) asc
              )) as DISTANCE_RANK
from  
    (select 
         ID, LATITUDE, LONGITUDE, COUNTRY_CD
     from   
         #temp) t1
--join the above list of ID to get near by ID
inner join 
     (select
          ID, LATITUDE, LONGITUDE, COUNTRY_CD
      from  
          #temp) t2 on t1.COUNTRY_CD = t2.COUNTRY_CD
                    --this brings ID available in 75 meters radius
                    and (t2.LATITUDE between (t1.LATITUDE - 0.00056) and (t1.LATITUDE + 0.00056))
                    and (t2.LONGITUDE between (t1.LONGITUDE - 0.00076) and (t1.LONGITUDE + 0.00076))    
                    --distance between t1 co-ordinates and t2 co-ordinates in meters
                    and (cast(6378137.0*sqrt(power((radians(t1.LATITUDE)-radians(t2.LATITUDE)),2) + power((radians(t1.LONGITUDE)-radians(t2.LONGITUDE))*cos(radians(t1.LATITUDE)),2)) as integer)) <= 40 --limit to 40 meters
                    and t1.ID != t2.ID     --exclude the same ID

上面的查询带来了40米以内的ID，但我不知道如何过滤集群中的ID？例如，“Cluster_1”？
参考此图片2 clusters from above 9 ID
请注意，我不会给予任何特定的坐标作为输入，但查询必须从表中的可用坐标中自动选择距离内的ID。
我的预期结果如下，

ID  LATITUDE    LONGITUDE   COUNTRY_CD  CLUSTER_NAME
ID1 10.81583689 78.61898689 IN  Cluster_1
ID2 10.81513789 78.61898789 IN  Cluster_1
ID3 10.81514889 78.61894889 IN  Cluster_1
ID4 10.81523989 78.61898989 IN  Cluster_1
ID5 10.81521089 78.61891089 IN  Cluster_1
ID6 10.81551189 78.61891189 IN  Cluster_1
ID7 10.81551189 78.61791189 IN  Cluster_2
ID8 10.81561189 78.61792189 IN  Cluster_2
ID9 10.81571189 78.61793189 IN  Cluster_2

有什么建议如何过滤ID是在一个集群？如果有任何其他简单的方法来做到这一点将是伟大的！

sql-server

来源：https://stackoverflow.com/questions/41937332/how-to-group-nearby-latitude-and-longitude-and-assign-a-name-number-to-that-grou

1条答案

按热度按时间

2izufjch1#

首先，我们创建一个计算的geography列，用于存储位置坐标。我们将使用此列让SQL Server为我们计算距离：

ALTER TABLE #temp
ADD Point_Geolocation AS geography::STPointFromText('POINT(' + CAST(LONGITUDE AS VARCHAR(100))+ ' ' + CAST(LATITUDE AS VARCHAR(100)) +')', 4326) PERSISTED

其次，创建一个包含所有邻近位置的表：

IF OBJECT_ID('tempdb..#Nearby_Points') IS NOT NULL DROP TABLE #Nearby_Points
CREATE TABLE #Nearby_Points (
        ID_1 VARCHAR(10) NOT NULL,
        ID_2 VARCHAR(10) NOT NULL,
        PRIMARY KEY (ID_1, ID_2)
)

INSERT INTO #Nearby_Points
(
    ID_1,
    ID_2
)
SELECT   t1.ID AS p1_ID
        ,t2.ID AS p2_ID
FROM #temp t1
    INNER JOIN #temp t2
        ON t1.ID < t2.ID
WHERE t1.Point_Geolocation.STDistance(t2.Point_Geolocation) < 40 -- Specify distance criteria here

-- SELECT * FROM #Nearby_Points

**注：**对于100k+坐标，我们需要进行大约50亿次计算：(100,000 ^ 2) / 2。执行上述查询可能需要一段时间。

第三，让我们创建一个表来存储我们的集群列表：

IF OBJECT_ID('tempdb..#Clusters') IS NOT NULL DROP TABLE #Clusters
CREATE TABLE #Clusters(
    Cluster_ID INT NOT NULL,
    Point_ID VARCHAR(10) NOT NULL,
    PRIMARY KEY(Cluster_ID, Point_ID)
);

-- This index may improve performance a little
CREATE NONCLUSTERED INDEX IX_Point_ID ON #Clusters(Point_ID);

最后，下面的代码将：
1.为第一个不在簇中的点创建新簇。
1.重复地重新扫描聚类表，并向现有聚类添加附加点，直到每个聚类包含应当属于它的所有点。
1.转到上面的步骤1并重复，直到没有创建新的群集。

DECLARE @Rowcount INT

INSERT INTO #Clusters
(
    Cluster_ID,
    Point_ID
)
SELECT   COALESCE((SELECT MAX(Cluster_ID) FROM #Clusters),0) + 1
        ,MIN(np.ID_1)
FROM #Nearby_Points np
WHERE np.ID_1 NOT IN (SELECT Point_ID FROM #Clusters)
HAVING MIN(np.ID_1) IS NOT NULL

SET @Rowcount = @@ROWCOUNT

WHILE @Rowcount > 0
BEGIN

    WHILE @Rowcount > 0
    BEGIN

            INSERT INTO #Clusters
            (
                Cluster_ID,
                Point_ID
            )
            SELECT   Cluster_ID
                    ,Point_ID
            FROM (
                    SELECT   np.ID_2 AS Point_ID
                            ,c.Cluster_ID
                    FROM #Nearby_Points np
                        INNER JOIN #Clusters c
                            ON np.ID_1 = c.Point_ID

                    UNION

                    SELECT   np.ID_1
                            ,c.Cluster_ID
                    FROM #Nearby_Points np
                        INNER JOIN #Clusters c
                            ON np.ID_2 = c.Point_ID
            ) vals
            WHERE NOT EXISTS (
                    SELECT 1
                    FROM #Clusters
                    WHERE Cluster_ID = vals.Cluster_ID
                    AND Point_ID = vals.Point_ID
            )

            SET @Rowcount = @@ROWCOUNT
    END

    INSERT INTO #Clusters
    (
        Cluster_ID,
        Point_ID
    )
    SELECT   COALESCE((SELECT MAX(Cluster_ID) FROM #Clusters),0) + 1
            ,MIN(np.ID_1)
    FROM #Nearby_Points np
    WHERE np.ID_1 NOT IN (SELECT Point_ID FROM #Clusters)
    HAVING MIN(np.ID_1) IS NOT NULL

    SET @Rowcount = @@ROWCOUNT
END

瞧

SELECT *
FROM #Clusters c


|Cluster_ID | Point_ID|
|-----------|---------|
|         1 | ID1     |
|         1 | ID2     |
|         1 | ID3     |
|         1 | ID4     |
|         1 | ID5     |
|         1 | ID6     |
|         2 | ID7     |
|         2 | ID8     |
|         2 | ID9     |

赞(0）回复(0）举报 2023-01-08

我来回答

如何在SQL Server中对附近的纬度和经度进行分组并为该组分配名称/编号？

1条答案

相关问题

热门标签

最新问答