如何在piggybank中使用over函数

rjjhvcjd  于 2021-06-25  发布在  Pig
关注(0)|答案(1)|浏览(378)

我对apache pig不太熟悉,无法找出使用piggybank的over函数进行累积计算时出现了什么问题。我希望在给定相同业务和地点的情况下,每个期间的累计工资用于以下数据:

business|location|period|salary
--------+--------+------+-------
100     |  East  |   1  |  100
100     |  East  |   1  |  55
100     |  East  |   2  |  100
100     |  East  |   3  |  150
100     |  West  |   1  |  150
100     |  West  |   2  |  200
100     |  West  |   3  |  250
200     |  East  |   1  |  50
200     |  East  |   2  |  50
200     |  East  |   3  |  50
200     |  West  |   1  |  80
200     |  West  |   2  |  100
200     |  West  |   3  |  120

我想要的结果是:

business|location|period|cumulative salary
--------+--------+------+---------------
  100   |  East  |  1   |    155
  100   |  East  |  2   |    255
  100   |  East  |  3   |    405
  100   |  West  |  1   |    150
  100   |  West  |  2   |    350
  100   |  West  |  3   |    600
  200   |  East  |  1   |    50
  200   |  East  |  2   |    100
  200   |  East  |  3   |    150
  200   |  West  |  1   |    80
  200   |  West  |  2   |    180
  200   |  West  |  3   |    300

根据医生的说法,我应该可以在

REGISTER /opt/mapr/pig/pig-0.12/contrib/piggybank/java/piggybank.jar;
A = LOAD '/user/sliang/pig/testData' USING PigStorage(',') as (business:long, location:chararray, period:long, salary:long);
B = group A by (business, location);
C = foreach B {
    C1 = order A by period;
    generate flatten(Stitch(C1, Over(C1.salary, 'sum(long)')));
};
D = foreach C generate business, location, period, $9;

但我从c开始出错了:

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve Stitch using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

我在谷歌上搜索了一下,但是没有太多关于这个的信息。。。我还用其他piggybank函数检查了jar,它可以工作,所以我猜这不是因为piggybank没有正确注册。我用的是pig0.12版本。
非常感谢您的帮助。谢谢您!

xriantvc

xriantvc1#

使用的完整包路径 Stitch 再加上 command .
ie,更换 Stitchorg.apache.pig.piggybank.evaluation.Stitch 以及 Overorg.apache.pig.piggybank.evaluation.Over 如果您想避免在pig脚本中使用上述冗长的包名,那么可以定义自己的宏(类似这样的宏),并在pig脚本中使用它。

DEFINE MYOVER org.apache.pig.piggybank.evaluation.Over;  

DEFINE MYSTITCH org.apache.pig.piggybank.evaluation.Stitch;

更新的pigscript:

A =  LOAD '/user/sliang/pig/testData' USING PigStorage(',') as (business:long, location:chararray, period:long, salary:long);
B = group A by (business, location);
C = foreach B {
    C1 = order A by period;
    generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.salary, 'sum(long)')));
};
D = foreach C generate business, location, period, $4;

E = RANK D;
F = GROUP E BY (stitched::business,stitched::location,stitched::period);
G = FOREACH F {
                 sortRankByDesc = ORDER E BY rank_D DESC;
                 topRank = LIMIT sortRankByDesc 1;
                 GENERATE FLATTEN(topRank);
              }
H = FOREACH G GENERATE $1 AS business,$2 AS location,$3 AS period,$4 AS salary;
DUMP H;

输出

(100,East,1,155)
(100,East,2,255)
(100,East,3,405)
(100,West,1,150)
(100,West,2,350)
(100,West,3,600)
(200,East,1,50)
(200,East,2,100)
(200,East,3,150)
(200,West,1,80)
(200,West,2,180)
(200,West,3,300)

相关问题