如何从pig中的列获取最大值?

zf9nrax1  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(337)

我有一个表,我想查询一列的和值。下表为详细信息:

grunt>teams_raw = load '/usr/input/Teams.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'UNIX', 'SKIP_INPUT_HEADER');
grunt>teams = foreach teams_raw generate $0 as year:int, $1 as lgID, $2 as tmID, $8 as g:float, $9 as w:float, $11 as t:float, $18 as name;
grunt> describe teams
teams: {year: bytearray,lgID: bytearray,tmID: bytearray,g: bytearray,w: bytearray,t: bytearray,name: bytearray};
grunt> gry_by_team = group teams by tmID;

当我试图得到 gteams 表格:

grunt> win = foreach grp_by_team generate group, SUM(teams.g) as win;
grunt>DUMP win
17/05/06 15:32:14 ERROR mapreduce.MRPigStatsUtil: 1 map reduce job(s) failed!
17/05/06 15:32:14 ERROR grunt.Grunt: ERROR 1066: Unable to open iterator for alias win
Details at logfile: /Users/joey/dev/bigdata/pig_1494048371690.log

在日志文件中,我看到以下异常。

================================================================================
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias win

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias win
        at org.apache.pig.PigServer.openIterator(PigServer.java:1019)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:747)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
        at org.apache.pig.Main.run(Main.java:564)
        at org.apache.pig.Main.main(Main.java:176)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
        at org.apache.pig.PigServer.openIterator(PigServer.java:1011)
        ... 13 more
================================================================================

下面是的转储数据 teams 以及 gry_by_team :

grunt>dump teams
...
(1994,NHL,TBL,48,17,3,Tampa Bay Lightning)
(1994,NHL,TOR,48,21,8,Toronto Maple Leafs)
(1994,NHL,VAN,48,18,12,Vancouver Canucks)
(1994,NHL,WAS,48,22,8,Washington Capitals)
(1994,NHL,WIN,48,16,7,Winnipeg Jets)
(1995,NHL,ANA,82,35,8,Mighty Ducks of Anaheim)
(1995,NHL,BOS,82,40,11,Boston Bruins)
(1995,NHL,BUF,82,33,7,Buffalo Sabres)
(1995,NHL,CAL,82,34,11,Calgary Flames)
...

grunt>dump gry_by_team
...
(1912,NHA,TBS,20,9,0,Toronto Blueshirts),(1916,NHA,TBS,14,7,0,Toronto Blueshirts),(1914,NHA,TBS,20,8,0,Toronto Blueshirts)})
(TO1,{(1912,NHA,TO1,20,7,0,Toronto Tecumsehs)})
(TOA,{(1917,NHL,TOA,22,13,0,Toronto Arenas),(1918,NHL,TOA,18,5,0,Toronto Arenas)})
(TOB,{(1916,NHA,TOB,14,7,0,228th Battalion)})
(TOO,{(1913,NHA,TOO,20,4,0,Toronto Ontarios),(1914,NHA,TOO,20,7,0,Toronto Ontarios/Shamrocks)})
...

我不知道我的代码怎么了。
下面是我正在使用的hadoop和pig版本:

$ pig --version
Apache Pig version 0.16.0 (r1746530) 
compiled Jun 01 2016, 23:10:49

$ hadoop version
Hadoop 2.8.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 91f2b7a13d1e97be65db92ddabc627cc29ac0009
Compiled by jdu on 2017-03-17T04:12Z
Compiled with protoc 2.5.0
From source with checksum 60125541c2b3e266cbf3becc5bda666
This command was run using /usr/local/Cellar/hadoop/2.8.0/libexec/share/hadoop/common/hadoop-common-2.8.0.jar
mmvthczy

mmvthczy1#

win = foreach grp_by_team generate group, SUM(teams.g) as win;

在“代码”列中 g 数据类型为 bytearray . SUM 使用以下数据类型: int, long, float, double, bigdecimal, biginteger or bytearray cast as double. . 你需要在这里施展才华 bytearray 作为 double . 有关更多信息,请参阅pig文档。
在代码中定义的架构 grunt>teams = foreach teams_raw generate $0 as year:int, $1 as lgID, $2 as tmID, $8 as g:float, $9 as w:float, $11 as t:float, $18 as name; 不接电话。因此,最好将模式与load语句一起指定。
例如: A = LOAD 'data' AS (a:chararray, b:int, c:int);

相关问题