假设我们有四张表employee ,部门,位置,技能,表,我们在hive中执行一个查询。
SELECT *
FROM Department
JOIN Employee ON (Department.emp_id =Employee.emp_id)
JOIN Location ON (Employee.location_id =Location.location_id)
JOIN (Employee.skill_code = Skills.skill_code)`
在这种情况下,有多少个mapreduce作业会运行?*
如果上述查询修改为
SELECT *
FROM Department
JOIN Employee ON (Department.emp_id =Employee.emp_id)
JOIN Location ON (Employee.location_id =Location.location_id)
JOIN (Employee.emp_id = Skills.emp_id)`.
在这种情况下,有多少个mapreduce作业会运行?
答案是3吗?总是number joins = Number of Mappers
。
1条答案
按热度按时间ldioqlga1#
底下,join操作是mapreduce作业,内部的一个join列会被转换为一个mapreduce作业,而且它从不取决于join的数量。
select * from department join employee on (department.emp_id = employee.emp_id) join location on (employee.location_id = location.location_id) join (employee.skill_code = skills.skill_code)
在上面的查询中,使用了3个不同的连接列(emp_id, location_id, skill_code),所以会有3个MR工作。
select * from department join employee on (department.emp_id = employee.emp_id) join location on (employee.location_id = location.location_id) join (employee.emp_id = skills.emp_id)
在上面的查询中,使用了2个不同的连接列(emp_id, location_id),所以会有2个MR作业。