为相同的记录组合在配置单元中增加时间戳

toe95027  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(271)

我在配置单元表中有一个数据集

input1,input2,input_time
key1,val1,2017-02-03 00:00:00
key1,val1,2017-02-03 00:00:00
key1,val2,2017-02-03 00:00:00
key1,val2,2017-02-03 00:00:00
key2,val1,2017-02-03 00:00:00

列(input1,input2)将形成唯一的组合记录。对于相同的唯一组合,我想用秒增加input_time列,即“2017-02-03 00:00:01”。
对于相同的组合,例如我有65条记录,一旦秒达到59秒,它应该递增(分+秒),即“2017-02-03 00:01:01”
我们如何为相同的记录组合增加时间,在Hive中可能吗?

Expected output:
input1,input2,input_time
key1,val1,2017-02-03 00:00:01
key1,val1,2017-02-03 00:00:02
key1,val2,2017-02-03 00:00:01
key1,val2,2017-02-03 00:00:02
key2,val1,2017-02-03 00:00:01
siotufzp

siotufzp1#

您可以使用window函数为要添加的每一行生成一个临时索引。

select 
   k, v , unix_timestamp(ts) as ts, 
   row_number() over ( partition by k,v ) as rn  
from ts_test

这将产生:

+----+----+----------+---+
|   k|   v|        ts| rn|
+----+----+----------+---+
|key1|val1|1486101600|  1|
|key1|val1|1486101600|  2|
|key1|val2|1486101600|  1|
|key1|val2|1486101600|  2|
|key2|val1|1486101600|  1|
+----+----+----------+---+

现在您可以继续并将其添加到您的时间字符串中,因为它已经是iso格式的了。

SELECT a.k, a.v, from_unixtime(ts+rn) as newts from 
   ( select k, v , unix_timestamp(ts) as ts, row_number() over ( partition by k,v ) as rn  
from ts_test )a 

+----+----+-------------------+
|   k|   v|              newts|
+----+----+-------------------+
|key1|val1|2017-02-03 00:00:01|
|key1|val1|2017-02-03 00:00:02|
|key1|val2|2017-02-03 00:00:01|
|key1|val2|2017-02-03 00:00:02|
|key2|val1|2017-02-03 00:00:01|
+----+----+-------------------+

如@dudumarkovitz所述,这也可以通过单一选择实现:

select 
   k, v , 
   from_unixtime(unix_timestamp(ts) + row_number() over ( partition by k,v order by v asc ) ) 
from ts_test

相关问题