我是新来的Pig和尝试理解为什么我不能计数后加入和分组方式:
A = LOAD 'mary' as (line);
B = LOAD 'mary' as (line);
wordsA = foreach A generate flatten(TOKENIZE(line)) as wordA;
grpdA = group wordsA by wordA;
cntdA = foreach grpdA generate group, COUNT(wordsA);
wordsB = foreach B generate flatten(TOKENIZE(line)) as wordB;
grpdB = group wordsB by wordB;
cntdB = foreach grpdB generate group, COUNT(wordsB), 'some text';
fltB = FILTER cntdB BY $1>1;
jnd = join cntdA by $1, fltB by $1;
jnd_n = foreach jnd generate $0;
grp = group jnd by $0;
out = foreach grp generate group, count(jnd_n);
dump jnd_n;
dump grp;
转储jnd\n:
(was)
(was)
(was)
(lamb)
(lamb)
(lamb)
(Mary)
(Mary)
(Mary)
转储组:
(was,{(was,2,was,2,some text),(was,2,Mary,2,some text),(was,2,lamb,2,some text)})
(Mary,{(Mary,2,was,2,some text),(Mary,2,Mary,2,some text),(Mary,2,lamb,2,some text)})
(lamb,{(lamb,2,was,2,some text),(lamb,2,Mary,2,some text),(lamb,2,lamb,2,some text)})
但我有个错误:
无效的标量投影:jnd\n:需要从关系中投影列才能将其用作标量
如果我试图更改代码:
out = foreach grp generate group, count(jnd_n.$0);
然后我得到另一个错误:
无法生成逻辑计划。嵌套异常:org.apache.pig.backend.executionengine.executexception:错误1070:无法使用导入解析计数:[,java.lang.,org.apache.pig.builtin.,org.apache.pig.impl.builtin.]
我知道我可以用另一种方法来做,但我想在完成两个pig操作join and group by之后得到这样的结果:
倾卸:
(was,3)
(lamb,3)
(Mary,3)
1条答案
按热度按时间xwbd5t1u1#
COUNT
需要戴帽子。计数是一个关键字。