用pig拉丁语解决累积数据

bzzcjhmw  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(311)

我从 Jmeter 上得到一些数据。example:-

Date          KWH
2018-12-01    50
2018-12-02    90
2018-12-03    150

我想通过pig代码提取kwh的实际值。
expected:-

Date         KWH
2018-12-02   40
2018-12-03   60
jk9hmnmh

jk9hmnmh1#

在hadoop中,引用上一条记录是很困难的,因为我们将输入分割并分配给不同的任务。我认为下面的方法可行,但效率很低(与按顺序读取数据的单个进程相比)。

A = LOAD 'test.txt' AS (a1:chararray, a2:int); 
B = FOREACH A GENERATE ToDate(a1, 'y-M-d', 'UTC') as date, a2;
C = FOREACH B GENERATE AddDuration(date, 'P1D') as nextdate, -a2 as a2;
D = join B by date, C by nextdate;
E = FOREACH D GENERATE B::date as date, B::a2 + C::a2 as value;
dump E;

相关问题