我对apache pig不太熟悉,无法找出使用piggybank的over函数进行累积计算时出现了什么问题。我希望在给定相同业务和地点的情况下,每个期间的累计工资用于以下数据:
business|location|period|salary
--------+--------+------+-------
100 | East | 1 | 100
100 | East | 1 | 55
100 | East | 2 | 100
100 | East | 3 | 150
100 | West | 1 | 150
100 | West | 2 | 200
100 | West | 3 | 250
200 | East | 1 | 50
200 | East | 2 | 50
200 | East | 3 | 50
200 | West | 1 | 80
200 | West | 2 | 100
200 | West | 3 | 120
我想要的结果是:
business|location|period|cumulative salary
--------+--------+------+---------------
100 | East | 1 | 155
100 | East | 2 | 255
100 | East | 3 | 405
100 | West | 1 | 150
100 | West | 2 | 350
100 | West | 3 | 600
200 | East | 1 | 50
200 | East | 2 | 100
200 | East | 3 | 150
200 | West | 1 | 80
200 | West | 2 | 180
200 | West | 3 | 300
根据医生的说法,我应该可以在
REGISTER /opt/mapr/pig/pig-0.12/contrib/piggybank/java/piggybank.jar;
A = LOAD '/user/sliang/pig/testData' USING PigStorage(',') as (business:long, location:chararray, period:long, salary:long);
B = group A by (business, location);
C = foreach B {
C1 = order A by period;
generate flatten(Stitch(C1, Over(C1.salary, 'sum(long)')));
};
D = foreach C generate business, location, period, $9;
但我从c开始出错了:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve Stitch using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
我在谷歌上搜索了一下,但是没有太多关于这个的信息。。。我还用其他piggybank函数检查了jar,它可以工作,所以我猜这不是因为piggybank没有正确注册。我用的是pig0.12版本。
非常感谢您的帮助。谢谢您!
1条答案
按热度按时间xriantvc1#
使用的完整包路径
Stitch
再加上command
.ie,更换
Stitch
与org.apache.pig.piggybank.evaluation.Stitch
以及Over
与org.apache.pig.piggybank.evaluation.Over
如果您想避免在pig脚本中使用上述冗长的包名,那么可以定义自己的宏(类似这样的宏),并在pig脚本中使用它。更新的pigscript:
输出