一个join中,hive使用了多少个mapreduce作业?

k4aesqcs  于 2021-04-02  发布在  Hive
关注(0)|答案(1)|浏览(666)

假设我们有四张表employee ,部门,位置,技能,表,我们在hive中执行一个查询。

SELECT * 
FROM Department 
JOIN Employee ON (Department.emp_id =Employee.emp_id) 
JOIN Location ON (Employee.location_id =Location.location_id) 
JOIN (Employee.skill_code = Skills.skill_code)`

在这种情况下,有多少个mapreduce作业会运行?*
如果上述查询修改为

SELECT * 
FROM Department 
JOIN Employee ON (Department.emp_id =Employee.emp_id) 
JOIN Location ON (Employee.location_id =Location.location_id) 
JOIN (Employee.emp_id = Skills.emp_id)`.

在这种情况下,有多少个mapreduce作业会运行?
答案是3吗?总是number joins = Number of Mappers

ldioqlga

ldioqlga1#

底下,join操作是mapreduce作业,内部的一个join列会被转换为一个mapreduce作业,而且它从不取决于join的数量。
select * from department join employee on (department.emp_id = employee.emp_id) join location on (employee.location_id = location.location_id) join (employee.skill_code = skills.skill_code)
在上面的查询中,使用了3个不同的连接列(emp_id, location_id, skill_code),所以会有3个MR工作。
select * from department join employee on (department.emp_id = employee.emp_id) join location on (employee.location_id = location.location_id) join (employee.emp_id = skills.emp_id)
在上面的查询中,使用了2个不同的连接列(emp_id, location_id),所以会有2个MR作业。

相关问题