部分更新的记录

eblbsuwk  于 2021-06-25  发布在  Hive
关注(0)|答案(1)|浏览(294)

我在配置单元中有一个目标表和几个增量表。我必须向target添加incrementl表,并为每行添加新的\u id。如果旧的\u id相等,我必须更新名称,并处理\u日期,但新的\u id必须来自前一行。你能帮我吗?

create table target (
    id           string,
    new_id           string,
    name         string,
    process_date string );

create table increment1 (
    id           string,
    name         string,
    process_date string );

create table increment2 like increment1;

insert into table increment1 values
('1', 'Apple', '2020-03-12'),
('3', 'Kiwi', '2020-03-12');

insert into table increment2 values
('3', 'Avocado', '2020-03-13'),
('4', 'Blackberry', '2020-03-12');

我试着在脚本中添加:

with a as (
    select coalesce(i.id, t.id)                     as id,
           coalesce(i.nk, t.new_id)                 as new_id,
           coalesce(i.name, t.name)                 as name,
           coalesce(i.process_date, t.process_date) as process_date
    from target t
             full join (
        select id,
               reflect("java.util.UUID", "randomUUID") as nk,
               name,
               process_date
        from increment1 -- increment2
    ) i on t.id = i.id
)
from a
insert overwrite table target
select *;

在第一次插入之后,我在target中获得了下一个数据:

1       89296ec9-cc0f-4f50-a3dc-22cdaa61f707    Apple   2020-03-12
2       5cd1a5f7-7f86-4daa-8b68-1db5a33a14f8    Orange  2020-03-12
3       2513bd6d-1947-4b41-a033-58ec25c78313    Kiwi    2020-03-12

第二次之后,我在target中得到了下一个数据:

1       89296ec9-cc0f-4f50-a3dc-22cdaa61f707    Apple   2020-03-12
2       5cd1a5f7-7f86-4daa-8b68-1db5a33a14f8    Orange  2020-03-12
3       df683397-c02e-4e6f-a09a-ad2d95f6b797    Avocado 2020-03-13
4       854df396-b345-499b-aa9b-6a6b0b84a92f    Blackberry      2020-03-12

为id=3的行更改了新的\u id。怎么让它变老?

3       2513bd6d-1947-4b41-a033-58ec25c78313    Avocado 2020-03-13
xmq68pz9

xmq68pz91#

如果我正确理解了您的需求,那么您正在尝试在目标上更新/插入(增量加载)。你可以用 merge 执行此操作的声明:

merge into target t
using ( select * from increment1) sub
on sub.id = t.id
when matched then update set t.name = sub.name, t.process_date= sub.process_date
when not matched then insert values (sub.id, sub.name, sub.process_date);

如果您不想这样做,请添加更多的上下文和预期的结果

相关问题