在具有忽略空值的条件相等操作的字段上配置单元sql联接

3b6akqbq  于 2021-06-25  发布在  Hive
关注(0)|答案(2)|浏览(333)

我有一张table

ID1 ID2 ID3 VALUE
1   A   xx  100
1   A   yy  200
1   A   yy  100
1   A       100
1   B   zz  300
2   B   yy  300
2   C   yy  400

表b有如下操作或规则,它可以写在3个以上的id字段上

ID1 ID2 ID3 ACTION
    A   xx  GetAxx
    A   yy  GetAyy
    B       GetB

我想加入表并产生如下结果

ID1 ID2 ID3 TOTAL
1   A   xx  100
1   A   yy  200
1   A   yy  100
1   B   zz  300
2   B   yy  300

我尝试了simple or condition和also conditional连接,但是得到了相同的错误失败:semanticexception cartesian products出于安全原因被禁用。如果您知道自己在做什么,请将hive.strict.checks.cartesian.product设置为false,并且hive.mapred.mode未设置为“strict”以继续。

SELECT * FROM TABLE_A A
JOIN TABLE_B B
ON
(B.ID1 IS NULL OR A.ID1=B.ID1) AND
(B.ID2 IS NULL OR A.ID2=B.ID2) AND
(B.ID3 IS NULL OR A.ID3=B.ID3)

SELECT * FROM TABLE_A A
JOIN TABLE_B B
ON
(COALESCE(A.ID1,"NA")=COALESCE(B.ID1, A.ID1, "NA") and
(COALESCE(A.ID2,"NA")=COALESCE(B.ID2, A.ID2, "NA") and
(COALESCE(A.ID3,"NA")=COALESCE(B.ID3, A.ID3, "NA")

当忽略连接条件中的空比较时,对多个字段进行有条件连接的正确方法是什么?或者,当表b中有空时,有没有方法强制条件等于true?或者,有没有方法强制表b中的空与表a中的任何值匹配以返回true
非常感谢您的帮助!

cxfofazt

cxfofazt1#

你可以用 exists :

select a.*
from a
where exists (select 1
              from b
              where (b.id1 = a.id1 or b.id1 is null) and
                    (b.id2 = a.id2 or b.id2 is null) and
                    (b.id3 = a.id3 or b.id3 is null)
             );

你可以用 JOIN 但在某些情况下,这可能会导致重复的行:

select a.*
from a join
     b
    on (b.id1 = a.id1 or b.id1 is null) and
       (b.id2 = a.id2 or b.id2 is null) and
        (b.id3 = a.id3 or b.id3 is null);
bxpogfeg

bxpogfeg2#

您可以通过直接对id2、id3列应用内部联接来获得结果。

SELECT A.ID1,A.ID2,A.ID3,A.VALUE as total FROM TABLE_A A
JOIN TABLE_B B
ON
A.ID2=B.ID2 and
A.ID3=B.ID3

相关问题