上个月数据的自联接表,用于添加丢失的记录

e7arh2l6  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(244)

我正在尝试将impala表与上个月的数据连接起来,以检查本月丢失的记录。我在源表中有员工记录。如果员工本月不在,但上个月在,则需要将该员工标记为“已终止”
尝试使用日期条件和员工姓名执行左侧外部联接,但未返回丢失的记录。
员工本月等于员工上月
本报告月等于上一报告月

Input Data:

+---------+---------+-----------+----------------+
|employee | branch  | hire_date | reporting_month|
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190131     |
+---------+---------+-----------+----------------+
| Judy    | GIP     | 20181014  |   20190131     |
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190228     |
+---------+---------+-----------+----------------+
| Judy    | GIP     | 20181014  |   20190228     |
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190331     |
+---------+---------+-----------+----------------+
| Judy    | GIP     | 20181014  |   20190331     |
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190430     |
+---------+---------+-----------+----------------+
| Max     | EEI     | 20170201  |   20190430     |
+---------+---------+-----------+----------------+

假设当前报告月份为20190430,员工judy不在,则需要为judy添加记录,术语标志为“已终止”

Expected Output:

+---------+---------+-----------+----------------+-----------+
|employee | branch  | hire_date | reporting_month| Term_flag |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190131     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190131     | NULL      |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190228     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190228     | NULL      |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190331     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190331     | NULL      |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190430     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190430     |Terminated | 
+---------+---------+-----------+----------------+-----------+
| Max     | EEI     | 20170201  |   20190430     | NULL      |
+---------+---------+-----------+----------------+-----------+
odopli94

odopli941#

我不知道你的神奇约会在哪里 20190430 来自。基本思想是 union all 具体如下:

select employee, branch, hire_date, reporting_month, null as term_flag
from input
union all
select employee, branch, hire_date, 20190430 as reporting_month, 'terminated'
from (select i.*,
             row_number() over (order by reporting_month desc) as seqnum
      from input i
     ) i
where seqnum = 1 and
      months_add(trunc(reporting_month, 'MON') , 1) < '2019-04-01';

月份的计算可能有点棘手,因为你的日期是一个月的最后一天而不是第一天。

相关问题