apache pig—pig中跨列字段的总和

a8jjtwal  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(331)

我有下面的测试数据。

A   B   C

M   O

M   M   M

M   M   M

N       O

P       N

我还想得到这些值的计数,比如m=7,n=2,o=2,p=1,其中a、b和c是列标题。我已经写了下面的代码。

test=  LOAD 'testdata' USING PigStorage(',') as(A:chararray,B:chararray,C:chararray); 
 values = FOREACH test GENERATE A==''?'null':(A is null?'null':A)) as A,(B==''?'null':(B is null?'null':B)) as B,(C==''?'null':(C is null?'null':C)) as C;  
 grp = GROUP values ALL;  
 A = FOREACH grp {
 B =FILTER test.A=='M' OR test.B=='M' OR test.C=='M';
 C =FILTER test.A=='N' OR test.B=='N' OR test.C=='N';
 D =FILTER test.A=='O' OR test.B=='O' OR test.C=='O';
 E =FILTER test.A=='P' OR test.B=='P' OR test.C=='P';
 GENERATE group, COUNT(B), COUNT(C),COUNT(D),COUNT(E);
  };

我得到一个错误“标量在输出中有多行”。任何输入都会有帮助!!

js81xvg6

js81xvg61#

将数据作为一行加载,标记字段,然后计数

A = load 'testdata' as (line:chararray);
B = foreach A generate flatten(TOKENIZE((chararray)line)) as word;
C = group B by word;
D = foreach C generate group,COUNT(B);
DUMP D;

相关问题