sql—如何在配置单元中两个不同表的两个数组中找到相同的元素?

nxagd54h  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(536)

我有两张表,如下所示:
表1:

id               sid
1   |   ['101', '102', '103']
2   |   ['102', '101', '103']
3   |   ['103', '101', '102']

表2:

id  |            sid
1   |   ['101', '102', '103']
3   |   ['102', '103']

我希望得到下表:

id               sid
1   |   ['101', '102', '103']
2   |   ['102', '101', '103']
3   |   ['103', '102']

说明:我希望以表1中相同的顺序选择表1.sid和表2.sid中相同的元素。此外,如果表1中的id在表2中不存在,则保持sid在表1中的状态。我该怎么办?

9jyewag0

9jyewag01#

你可以用 posexplode() 基本上做你想做的事。当然,所有这些数组在hive中比在其他数据库中更为复杂,尤其是按您想要的顺序获取结果:

select t1.id, collect_list(sid2)
from (select t1.id, t2.sid2, t1.pos1
      from (table1 t1 lateral view
            posexplode(t1.sid) as pos1, sid1
           ) left join
           (table2 t2 lateral view
            posexplode(t2.sid) as pos2, sid2
           )
           on t2.id = t1.id and t2.sid2 = t1.sid1
      distribute by t1.id
      order by t1.id, t1.pos1
     ) t

像这样:
用t as(select t1.id,collect \u list(sid2)as sid from(select t1.id,t2.sid2,t1.pos1 from(table1 t1 lateral view posexplode(t1.sid)as pos1,sid1)left join(table2 t2 lateral view posexplode(t2.sid)as pos2,sid2)on t2.id=t1.id and t2.sid2=t1.sid1 distribute by t1.id order by t1.id,t1.pos1)t select t1.id,(大小(t.sid)=0时的情况下,则t1.sid否则t.sid结束)从t1左连接t on t1.id=t.id

相关问题