infiles = load '/hdfs/bhavesh/Youtube_POC/Youtube/0222/{0,1,2,3,4}.txt' using PigStorage('\t') as
(videoid:chararray,uploader:chararray,age:int,category:chararray,length:int,views:int,rate:int,rating:int,comments:int,related_id:chararray);
files = FILTER infiles BY category is not null;
grpn_for_catagories = group files by category;
cnt_for_catagories = foreach grpn_for_catagories generate group, COUNT(files.videoid) as counting;
sorted_for_catagories_desc = order cnt_for_catagories by counting desc;
top5_for_catagories = limit sorted_for_catagories_desc 5;
3条答案
按热度按时间icnyk63a1#
a=使用pigstorage()加载“文件”(name:chararray, count:int);
b=计数顺序a;
c=极限b 1;
d=foreach c生成名称;
倾倒区d;
whhtz7ly2#
下面的例子将帮助您获得前5计数
详细说明见
http://ybhavesh.blogspot.in/2015/08/proof-of-concept-or-poc-on-youtube-data.html
希望有帮助!!!。。。
a6b3iqyw3#
您可以使用orderby和limit:-
a=使用pigstorage()加载“文件”(name:chararray, count:int);
b=按计数排序;--默认情况下,它将按升序排列
c=极限b 1;
d=foreach c生成名称;
倾倒区d;
b=按计数说明订购a;
c=极限b 1;
d=foreach c生成名称;
倾倒区d;