SQL Server Extract distinct graph-linked values from two columns

baubqpgj  于 2023-06-04  发布在  其他
关注(0)|答案(3)|浏览(120)
ColumnACoumnB
15
21
310
48
14
15
25

In this,

  • 1 is link with 5
  • 5 is link with 2
  • 2 is link with 1
  • 1 is link with 4
  • 4 is link with 8
  • 1 is link with 5
  • 5 is link with 2

So the output what I want is distinct values that are linked with each other
| output |
| ------------ |
| 1 |
| 5 |
| 2 |
| 4 |
| 8 |

9vw9lbht

9vw9lbht1#

One approach that attempts to solve this problem is the following:

  • extract distinct combinations of your pairs
  • count the frequency of first column values
  • extract each pairs whose first column appears more than once, or that appears in the second column at least once
  • linearize your data

Each of these steps is done in a separate cte.

WITH cte AS (
    SELECT DISTINCT LEAST(ColumnA, ColumnB)    AS col1, 
                    GREATEST(ColumnA, ColumnB) AS col2
    FROM tab
), cte2 AS (
    SELECT col1, col2, 
           COUNT(col1) OVER(PARTITION BY col1) AS cnt_col1
    FROM cte t1
), cte3 AS (
    SELECT * FROM cte2 WHERE cnt_col1 > 1 
                          OR EXISTS(SELECT 1 FROM cte2 t2 WHERE cte2.col1 = t2.col2)
)
SELECT col1 FROM cte3 UNION SELECT col2 FROM cte3

If you instead need to begin from a specific index and are looking for all linked nodes in the graph, you need a recursive query:

  • Base step: extracts unique records of <ColumnA, ColumnB> for a specific id
  • Recursive step: extracts consecutive records on cte.ColumnB = tab.ColumnA.

Then it's just sufficient to gather distinct values of ColumnB in the output. In the following case ColumnA = 1, if you want to change it, you need to change the value in the filtering condition of the base step WHERE LEAST(ColumnA, ColumnB) = <your_value>

WITH cte AS (
    SELECT DISTINCT LEAST(ColumnA, ColumnB)    AS ColumnA,
                    GREATEST(ColumnA, ColumnB) AS ColumnB
    FROM tab 
    WHERE LEAST(ColumnA, ColumnB) = 1

    UNION ALL

    SELECT tab.ColumnA, tab.ColumnB 
    FROM cte 
    INNER JOIN tab
            ON cte.ColumnB = tab.ColumnA
)
SELECT DISTINCT ColumnB FROM cte

Output (for id = 1):

col1
1
2
4
5
8

Check the demo here .

e5nszbig

e5nszbig2#

I guess the following is what you are looking for, but I believe your sample data is not correct in respect of the desired output:

DROP TABLE IF EXISTS #TempTable

CREATE TABLE #TempTable (
  ColumnA INT,
  ColumnB INT
)

INSERT INTO #TempTable (ColumnA, ColumnB)
VALUES (1, 5), (2, 1), (3, 10), (4, 8), (1, 4), (1, 5), (2, 5)

SELECT group_id
      ,ColumnA
      ,ColumnB
      ,COUNT(*)
FROM
(
    SELECT 1 as group_id,*
    FROM #TempTable
    UNION ALL
    SELECT 2,ColumnB, ColumnA
    FROM #TempTable
) DS
GROUP BY group_id
        ,ColumnA
        ,ColumnB
HAVING COUNT(*) = 2

The idea is to unite the data in two groups and get only this records with count 2 - meaning we have x-y and y-x scenario.

The output of the above query is:

From there with another nested query or CTE you can get only the columns you want.

gj3fmq9x

gj3fmq9x3#

Bsed on the simple data and the expected output, I think this is what you are looking for :

with cte as (
  select  ColumnA, ColumnB
  from mytable
  UNION ALL
  SELECT t.ColumnA, t.ColumnB
  from mytable t
  inner join cte c on c.ColumnB = t.ColumnA
)
select ColumnA AS linked_values
from cte
group by ColumnA
having count(*) > 1
union 
select ColumnB
from cte
group by ColumnB
having count(*) > 1

Result :

linked_values
1
2
4
5
8

Demo here

相关问题