配置单元-如果在另一个表中找不到记录,则用今天的日期更新该表中的记录?

ki0zmccv  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(346)

我现在有一个主结果表(test1),它存储了我所有的问题记录,还有一个表(test2),它大约每星期运行一次,我试图在每周更新中找到那些不存在的记录,然后更新主结果表中的日期,因为它在系统中被更新以进行更正。
我正在尝试添加来自 test2 表到 test1 如果它们还不在表中,则返回表。
这样做有效:

insert into table test1 (id, name, code)
select * from test2 t2 where t2.id not in (select id from test1);

我也在尝试更新表
test1 'Corrected_date' 列以显示在中找到的所有记录的当前\u日期 test1 但不是在 test2 示例数据如下:
表1

ID    NAME    CODE    CORRECTED_DATE
1     TEST    3    
29    TEST2   90

表2

ID    NAME    CODE  
12    TEST5   20
1     TEST    3

表1的预期最终结果

ID    NAME    CODE    CORRECTED_DATE
1     TEST    3       
29    TEST2   90       3/13/2019
12    TEST5   20
7jmck4yq

7jmck4yq1#

使用完全联接覆盖表。 FULL JOIN 返回联接记录+未从左表联接+未从右表联接。您可以使用case语句实现如下逻辑:

insert OVERWRITE table test1

select 
      --select t1 if both or t1 only exist, t2 if only t2 exists
      case when t1.ID is null then t2.ID   else t1.ID   end as ID,
      case when t1.ID is null then t2.NAME else t1.NAME end as NAME,
      case when t1.ID is null then t2.CODE else t1.CODE end as CODE,

      --if found in t1 but not in t2 then current_date else leave as is
      case when (t1.ID is not null) and (t2.ID is null) then current_date else t1.CORRECTED_DATE end as CORRECTED_DATE 
  from test1 t1 
       FULL OUTER JOIN test2 t2 on t1.ID=t2.ID;

另请参见有关增量更新的类似问题,您的逻辑不同,但方法相同:https://stackoverflow.com/a/37744071/2700344
使用数据进行测试:

with test1 as (
select stack (2,
1, 'TEST',    3,null,    
29,'TEST2',   90 , null
             ) as (ID,NAME,CODE,CORRECTED_DATE)
),

     test2 as (
select stack (2,
              12,'TEST5',20,
              1,'TEST',3
             ) as (ID, NAME, CODE)
)

select 
      --select t1 if both or t1 only exist, t2 if only t2 exists
      case when t1.ID is null then t2.ID   else t1.ID   end as ID,
      case when t1.ID is null then t2.NAME else t1.NAME end as NAME,
      case when t1.ID is null then t2.CODE else t1.CODE end as CODE,

      --if found in test1 but not in test2 then current_date else leave as is
      case when (t1.ID is not null) and (t2.ID is null) then current_date else t1.CORRECTED_DATE end as CORRECTED_DATE 
  from test1 t1 
       FULL OUTER JOIN test2 t2 on t1.ID=t2.ID;

结果:

OK
id      name    code    corrected_date
1       TEST    3       NULL
12      TEST5   20      NULL
29      TEST2   90      2019-03-14
Time taken: 41.727 seconds, Fetched: 3 row(s)

结果如预期。

相关问题