合并配置单元表上的重复记录

ffscu2ro 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(317)

我有下表获得增量更新。我需要编写一个普通的配置单元查询，将具有相同键值的行与最近的值合并。

Key |  A  |  B  |   C  |  Timestamp
K1  |  X  |  Null |  Null | 2015-05-03
K1 |  Null | Y  |    Z  |  2015-05-02
K1  |  Foo |  Bar  |  Baz  | 2015-05-01

想要得到：

Key |  A  |  B  |   C  |  Timestamp
K1  | X  | Y  |  Z  |  2015-05-03

hadoop Hive hiveql

来源：https://stackoverflow.com/questions/46437463/merging-duplicate-records-on-hive-table

1条答案

按热度按时间

idv4meu81#

使用first_value（）函数获取最后一个非空值。需要合并排序键，因为最后一个\u值仅适用于一个排序键。
演示：

select distinct
key,
first_value(A) over (partition by Key order by concat(case when A is null then '1' else '2' end,'_',Timestamp)desc) A,
first_value(B) over (partition by Key order by concat(case when B is null then '1' else '2' end,'_',Timestamp)desc) B,
first_value(C) over (partition by Key order by concat(case when C is null then '1' else '2' end,'_',Timestamp)desc) C,
max(timestamp) over(partition by key) timestamp
from 
(  ---------Replace this subquery with your table
select 'K1'  key,  'X'   a, Null    b,  Null   c, '2015-05-03' timestamp union all 
select 'K1'  key,  null  a, 'Y'     b,  'Z'    c, '2015-05-02' timestamp union all
select 'K1'  key,  'Foo' a, 'Bar'   b,  'Baz'  c, '2015-05-01' timestamp
)s
;

输出：

OK
key     a       b       c       timestamp
K1      X       Y       Z       2015-05-03

赞(0）回复(0）举报 2021-06-02

我来回答

合并配置单元表上的重复记录

1条答案

相关问题

热门标签

最新问答