我在配置单元中有两个表,它们是使用scd类型2管理的(https://en.wikipedia.org/wiki/slowly_changing_dimension#type_2:\添加\新\行)。基本上,每个记录有两列valid\u start\u date和valid\u end\u date来指定它的有效期。
两张表:
雇员(注意雇员id 1在2010年10月20日更改了地址)
employee_id employee_name employee_address valid_start_date valid_end_date
1 Bob 123 XXX 2010-01-01 2010-10-20
2 Alice 999 YYY 2010-01-01 9999-12-31
1 Bob 567 ZZZ 2010-10-20 9999-12-31
工作场所(请注意,雇员id 1在2010年12月31日更改了建筑)
employee_id building_id valid_start_date valid_end_date
1 1 2010-01-01 2010-12-31
1 2 2010-12-31 9999-12-31
2 2 2010-01-01 9999-12-31
我想加入他们,得到这个结果
employee_id employee_name employee_address building_id valid_start_date valid_end_date
1 Bob 123 XXX 1 2010-01-01 2010-10-20
1 Bob 567 ZZZ 1 2010-10-20 2010-12-31
1 Bob 567 ZZZ 2 2010-12-31 9999-12-31
2 Alice 999 YYY 1 2010-01-01 9999-12-31
这就是我提出的问题
WITH
valid_dates AS (
SELECT
employee_id
, valid_start_date valid_date
FROM
employee
UNION
SELECT
employee_id
, valid_end_date valid_date
FROM
employee
UNION
SELECT
employee_id
, valid_start_date valid_date
FROM
workplace
UNION
SELECT
employee_id
, valid_end_date valid_date
FROM
workplace
),
valid_date_ranges AS (
SELECT
employee_id
, valid_start_date
, valid_end_date
FROM (
SELECT
employee_id
, valid_date valid_start_date
, LEAD(valid_date, 1) OVER (
PARTITION BY employee_id
ORDER BY valid_date) valid_end_date
FROM
valid_dates
) valid_date_ranges_with_null
WHERE
valid_end_date IS NOT NULL
)
SELECT
vdr.employee_id
, e.employee_name
, e.employee_address
, wp.building_id
, vdr.valid_start_date
, vdr.valid_end_date
FROM
employee e
INNER JOIN
valid_date_ranges vdr
ON
e.employee_id = vdr.employee_id
LEFT OUTER JOIN -- there may be employees without workplace
workplace wp
ON
wp.employee_id = vdr.employee_id
WHERE
e.valid_start_date < vdr.valid_end_date
AND e.valid_end_date > vdr.valid_start_date
AND wp.valid_start_date < vdr.valid_end_date
AND wp.valid_end_date > vdr.valid_start_date
;
我似乎产生了正确的结果,但我希望我的查询运行得更快(这是目前我的管道瓶颈)。另外,我还要做同样的事情,最多5个表连接在一起,每个表最多30亿行,所以我真的希望有办法优化这个查询。你能帮帮我吗?谢谢您!
我在hive2.1.0上,顺便说一下,还没有非equi连接。
暂无答案!
目前还没有任何答案,快来回答吧!