一个join中，hive使用了多少个mapreduce作业？

k4aesqcs 于 2021-04-02 发布在 Hive

关注(0)|答案(1)|浏览(666)

假设我们有四张表employee ，部门，位置，技能，表，我们在hive中执行一个查询。

SELECT * 
FROM Department 
JOIN Employee ON (Department.emp_id =Employee.emp_id) 
JOIN Location ON (Employee.location_id =Location.location_id) 
JOIN (Employee.skill_code = Skills.skill_code)`

在这种情况下，有多少个mapreduce作业会运行？*
如果上述查询修改为

SELECT * 
FROM Department 
JOIN Employee ON (Department.emp_id =Employee.emp_id) 
JOIN Location ON (Employee.location_id =Location.location_id) 
JOIN (Employee.emp_id = Skills.emp_id)`.

在这种情况下，有多少个mapreduce作业会运行？
答案是3吗？总是number joins = Number of Mappers。

Hive hiveql

来源：https://stackoverflow.com/questions/64904448/how-many-number-of-mapreduce-jobs-are-used-by-hive-in-a-join

1条答案

按热度按时间

ldioqlga1#

底下，join操作是mapreduce作业，内部的一个join列会被转换为一个mapreduce作业，而且它从不取决于join的数量。
select * from department join employee on (department.emp_id = employee.emp_id) join location on (employee.location_id = location.location_id) join (employee.skill_code = skills.skill_code)
在上面的查询中，使用了3个不同的连接列(emp_id, location_id, skill_code)，所以会有3个MR工作。
select * from department join employee on (department.emp_id = employee.emp_id) join location on (employee.location_id = location.location_id) join (employee.emp_id = skills.emp_id)
在上面的查询中，使用了2个不同的连接列(emp_id, location_id)，所以会有2个MR作业。

赞(0）回复(0）举报 2021-04-03

我来回答

一个join中，hive使用了多少个mapreduce作业？

1条答案

相关问题

热门标签

最新问答