多个id的内部连接(如果它们不为null)

cigdeys3  于 2021-06-25  发布在  Hive
关注(0)|答案(1)|浏览(290)

我在创建捕获以下记录的连接时遇到了一些问题。我昨天花了大约5个小时想弄明白,但是弄不明白。
我有两个表,表a和表b这两个表都有以下列:

ID_1, ID_2, ID_3, ID_4

现在,我需要在两个表之间创建一个连接,以便结果在匹配的id上提取不为null的记录,如果超过1个id匹配,那么我将使用所有匹配的id来提取记录,因此有两种情况:
场景1:两个表中的所有ID都完全匹配(这很容易编码)
在这里我会加入所有的身份证。

+--------+---------+---------+--------+
| A.ID_1 |  A.ID_2 |  A.ID_3 | A.ID_4 |
+--------+---------+---------+--------+
| CAD    |   AAPL  |     853 |    200 |
+--------+---------+---------+--------+

+--------+--------+--------+--------+
| B.ID_1 | B.ID_2 | B.ID_3 | B.ID_4 |
+--------+--------+--------+--------+
| CAD    | AAPL   |    853 |    200 |
+--------+--------+--------+--------+

场景2:两个表中有一个或多个id匹配,其余为空(也很简单)
在这里,我将加入由身份证1和身份证3只。

+--------+--------+--------+--------+
| A.ID_1 | A.ID_2 | A.ID_3 | A.ID_4 |
+--------+--------+--------+--------+
| CAD    | NULL   |    933 | NULL   |
+--------+--------+--------+--------+

+--------+--------+--------+--------+
| B.ID_1 | B.ID_2 | B.ID_3 | B.ID_4 |
+--------+--------+--------+--------+
| CAD    | NULL   |    933 | NULL   |
+--------+--------+--------+--------+

场景3:表中有一个或多个id匹配,但有些不匹配
在这里,我只需要在id\u1和id\u2上进行连接,因为id\u3和id\u4对于各自的表是空的。

+--------+--------+--------+--------+
| A.ID_1 | A.ID_2 | A.ID_3 | A.ID_4 |
+--------+--------+--------+--------+
| CAD    |  TSLA  |    341 | NULL   |
+--------+--------+--------+--------+

+--------+--------+--------+--------+
| B.ID_1 | B.ID_2 | B.ID_3 | B.ID_4 |
+--------+--------+--------+--------+
| CAD    |  TSLA  |  NULL  |    250 |
+--------+--------+--------+--------+

场景4:所有id都为空,因此记录被拒绝
如果表a包含以下内容:

+--------+--------+--------+--------+
| A.ID_1 | A.ID_2 | A.ID_3 | A.ID_4 |
+--------+--------+--------+--------+
| CAD    |  AAPL  |  853   |   200  |
+--------+--------+--------+--------+
| CAD    | NULL   |  933   | NULL   |
+--------+--------+--------+--------+ 
| CAD    |  TSLA  |  341   | NULL   |
+--------+--------+--------+--------+
| NULL   |  NULL  |  NULL  | NULL   |
+--------+--------+--------+--------+

表b包含以下内容:

+--------+--------+--------+--------+
| B.ID_1 | B.ID_2 | B.ID_3 | B.ID_4 |
+--------+--------+--------+--------+
| CAD    |  AAPL  |  853   |   200  |
+--------+--------+--------+--------+
| CAD    |  NULL  |  933   |  NULL  |
+--------+--------+--------+--------+ 
| CAD    |  TSLA  |  NULL  |   250  |
+--------+--------+--------+--------+
| NULL   |  NULL  |  NULL  |  NULL  |
+--------+--------+--------+--------+

结果是:

+--------+--------+--------+--------+
| ID_1   | ID_2   | ID_3   | ID_4   |
+--------+--------+--------+--------+
| CAD    |  AAPL  |  853   |   200  |
+--------+--------+--------+--------+
| CAD    |  NULL  |  933   |  NULL  |
+--------+--------+--------+--------+ 
| CAD    |  TSLA  |  341   |  NULL  |
+--------+--------+--------+--------+

谢谢

xqnpmsa8

xqnpmsa81#

也许你想要这样的东西?总有人写的代码比较短…:-)

WITH directMatch AS (
    SELECT
        A.*
    FROM
        A
        INNER JOIN B
            -- exclude scenario 4
            ON (
                A.ID_1 IS NOT NULL
                OR A.ID_2 IS NOT NULL
                OR A.ID_3 IS NOT NULL
                OR A.ID_4 IS NOT NULL
            )
            AND (
                B.ID_1 IS NOT NULL
                OR B.ID_2 IS NOT NULL
                OR B.ID_3 IS NOT NULL
                OR B.ID_4 IS NOT NULL
            )
            -- keep scenario 1+2
            AND (
                A.ID_1 = B.ID_1
                OR A.ID_1 IS NULL AND B.ID_1 IS NULL
            )
            AND (
                A.ID_2 = B.ID_2
                OR A.ID_2 IS NULL AND B.ID_2 IS NULL
            )
            AND (
                A.ID_3 = B.ID_3
                OR A.ID_3 IS NULL AND B.ID_3 IS NULL
            )
            AND (
                A.ID_4 = B.ID_4
                OR A.ID_4 IS NULL AND B.ID_4 IS NULL
            )
)
SELECT
    *
FROM
    -- scenario 1+2
    directMatch
UNION ALL SELECT
    A.*
FROM
    A
    INNER JOIN B
        -- exclude scenario 4
        ON (
            A.ID_1 IS NOT NULL
            OR A.ID_2 IS NOT NULL
            OR A.ID_3 IS NOT NULL
            OR A.ID_4 IS NOT NULL
        )
        AND (
            B.ID_1 IS NOT NULL
            OR B.ID_2 IS NOT NULL
            OR B.ID_3 IS NOT NULL
            OR B.ID_4 IS NOT NULL
        )
        -- scenario 3
        AND (
            COALESCE(A.ID_1, B.ID_1) = COALESCE(B.ID_1, A.ID_1)
            OR A.ID_1 IS NULL AND B.ID_1 IS NULL
        )
        AND (
            COALESCE(A.ID_2, B.ID_2) = COALESCE(B.ID_2, A.ID_2)
            OR A.ID_2 IS NULL AND B.ID_2 IS NULL
        )
        AND (
            COALESCE(A.ID_3, B.ID_3) = COALESCE(B.ID_3, A.ID_3)
            OR A.ID_3 IS NULL AND B.ID_3 IS NULL
        )
        AND (
            COALESCE(A.ID_4, B.ID_4) = COALESCE(B.ID_4, A.ID_4)
            OR A.ID_4 IS NULL AND B.ID_4 IS NULL
        )
        AND NOT EXISTS(
            SELECT
                *
            FROM
                directMatch m
            WHERE 
                (
                    A.ID_1 = m.ID_1
                    OR A.ID_1 IS NULL AND m.ID_1 IS NULL
                )
                AND (
                    A.ID_2 = m.ID_2
                    OR A.ID_2 IS NULL AND m.ID_2 IS NULL
                )
                AND (
                    A.ID_3 = m.ID_3
                    OR A.ID_3 IS NULL AND m.ID_3 IS NULL
                )
                AND (
                    A.ID_4 = m.ID_4
                    OR A.ID_4 IS NULL AND m.ID_4 IS NULL
                )
        )
        AND NOT EXISTS(
            SELECT
                *
            FROM
                directMatch m
            WHERE 
                (
                    B.ID_1 = m.ID_1
                    OR B.ID_1 IS NULL AND m.ID_1 IS NULL
                )
                AND (
                    B.ID_2 = m.ID_2
                    OR B.ID_2 IS NULL AND m.ID_2 IS NULL
                )
                AND (
                    B.ID_3 = m.ID_3
                    OR B.ID_3 IS NULL AND m.ID_3 IS NULL
                )
                AND (
                    B.ID_4 = m.ID_4
                    OR B.ID_4 IS NULL AND m.ID_4 IS NULL
                )
        )

相关问题