ApachePig字数计算程序

2ul0zpep  于 2021-06-21  发布在  Pig
关注(0)|答案(3)|浏览(385)

在字数统计程序中,如何找出pig中出现最多的词和出现最少的词。如何在这里使用max函数。
我看到的输出是这样的
(纳文,3)(is,5)
我需要的是“是”

icnyk63a

icnyk63a1#

a=使用pigstorage()加载“文件”(name:chararray, count:int);
b=计数顺序a;
c=极限b 1;
d=foreach c生成名称;
倾倒区d;

whhtz7ly

whhtz7ly2#

下面的例子将帮助您获得前5计数

infiles = load '/hdfs/bhavesh/Youtube_POC/Youtube/0222/{0,1,2,3,4}.txt' using PigStorage('\t') as 
 (videoid:chararray,uploader:chararray,age:int,category:chararray,length:int,views:int,rate:int,rating:int,comments:int,related_id:chararray);
files = FILTER infiles BY category is not null;
grpn_for_catagories = group files by category;
cnt_for_catagories = foreach grpn_for_catagories generate group, COUNT(files.videoid) as counting;
sorted_for_catagories_desc = order cnt_for_catagories by counting desc;
top5_for_catagories = limit sorted_for_catagories_desc 5;

详细说明见
http://ybhavesh.blogspot.in/2015/08/proof-of-concept-or-poc-on-youtube-data.html
希望有帮助!!!。。。

a6b3iqyw

a6b3iqyw3#

您可以使用orderby和limit:-
a=使用pigstorage()加载“文件”(name:chararray, count:int);
b=按计数排序;--默认情况下,它将按升序排列
c=极限b 1;
d=foreach c生成名称;
倾倒区d;
b=按计数说明订购a;
c=极限b 1;
d=foreach c生成名称;
倾倒区d;

相关问题