我是一个新的Pig和工作的问题,我需要找到在这个数据集的球员与最大重量。以下是数据示例:
id, weight,id,year, triples
(bayja01,210,bayja01,2005,6)
(crawfca02,225,crawfca02,2005,15)
(damonjo01,205,damonjo01,2005,6)
(dejesda01,190,dejesda01,2005,6)
(eckstda01,170,eckstda01,2005,7)
这是我的Pig剧本:
batters = LOAD 'hdfs:/user/maria_dev/pigtest/Batting.csv' using PigStorage(',');
realbatters = FILTER batters BY $1==2005;
triphitters = FILTER realbatters BY $9>5;
tripids = FOREACH triphitters GENERATE $0 AS id,$1 AS YEAR, $9 AS Trips;
names = LOAD 'hdfs:/user/maria_dev/pigtest/Master.csv'
using PigStorage(',');
weights = FOREACH names GENERATE $0 AS id, $16 AS weight;
get_ids = JOIN weights BY (id), tripids BY(id);
wts = FOREACH get_ids GENERATE MAX(get_ids.weight)as wgt;
DUMP wts;
当然,倒数第二行行不通。它告诉我我必须使用显式演员阵容。我已经弄清楚了过滤等-但无法弄清楚如何得到最终答案。
1条答案
按热度按时间xlpyo6sf1#
这个
MAX
pig中的函数需要一袋值,并返回袋中的最高值。要创建包,必须首先GROUP
您的数据:如果您想要所有数据的最大重量,可以使用
GROUP ALL
: