连接配置单元中的分区表

gdrx4gfi 于 2021-06-26 发布在 Hive

关注(0)|答案(2)|浏览(529)

假设我有两个分区表 customer 以及 items 两者都被 country 以及 state 柱。
如果我想检索特定国家和州的数据，那么这是连接这些表的内容的正确方法吗？

select 
  customer.id, 
  customer.name, 
  items.name, 
  items.value
from
  customers
  join items
  on customers.id == items.customer_id
  and customers.country == 'USA'
  and customers.state == 'TX'
  and items.country == 'USA'
  and items.state == 'TX'

或者这些条件应该放在where子句中？

and customers.country == 'USA'
and customers.state == 'TX'
and items.country == 'USA'
and items.state == 'TX'

Hive hiveql

来源：https://stackoverflow.com/questions/41905510/joining-partitioned-tables-in-hive

2条答案

按热度按时间

6ie5vjzr1#

我们可以连接分区表，分区只是文件夹结构，分区是指根据特定列的值（例如：date、state等）将表划分为相关部分的方式。对于ex，我有如下分区

show partitions table_name1 
year=2016/month=12/day=1/part=10

show partitions table_name2 
year=2016/month=12/day=1/part=1

现在我们可以用下面的方法连接表

select i.col1, c.col1
FROM (SELECT * FROM table_name1 WHERE year=2016 AND month=12 AND day=1) i
JOIN (SELECT * FROM table_name2 WHERE year=2016 AND month=12 AND day=1) c
ON i.col2= c.col2
AND i.col3= c.col3
AND i.col3= c.col3
GROUP BY c.col1

或者

SELECT i.col1, c.col1
FROM table_name1
JOIN table_name2
ON i.col2= c.col2
AND i.col3= c.col3
AND i.col3= c.col3
AND i.year=2016 AND i.month=12 AND i.day=1
AND c.year=2016 AND c.month=12 AND c.day=1
GROUP BY c.col1

赞(0）回复(0）举报 2021-06-26

14ifxucb2#

对于简单查询，hive将在reduce阶段之前推送 predicate ，因此在这种情况下，将条件放在“on”或“where”子句上的性能是相同的。但是，如果您编写其他查询来比较表之间的字段（表1.a<表2.b），那么hive将执行连接并在结束时应用where条件（reducer阶段），就像大多数关系数据库一样。

赞(0）回复(0）举报 2021-06-26

我来回答

连接配置单元中的分区表

2条答案

相关问题

热门标签

最新问答