各位Maven,
我有一个数据集:
Field_A Field_B DATE
John 1 01-01-2016
John 1 05-01-2016
Cate 1 05-01-2016
Cate 4 01-01-2016
Cate 6 05-01-2016
Perdi 4 01-01-2016
我试图计算每个字段的计数(*),并根据字段a和日期创建一个等级。基本上我想退回这个:
Field_A Count Rank Field_B
John 2 1 1
John 2 2 1
Cate 3 3 1
Cate 3 4 4
Cate 3 3 6
Perdi 1 5 4
为此,我尝试使用以下代码:
DATA = load '...'
AS
(Field_A:Int,
FIELD_B:Int,
DATE:CHARARRAY);
A = rank DATA BY Field_A;
B = GROUP A BY $0;
C = foreach B {
CNT = COUNT(A.Field_A);
generate $0, CNT;
}
D = join A by $0, C by $0;
E = rank D BY DATE,Field_A DENSE;
F = foreach E generate $0 AS RANK,Field_A,CNT;
DUMP F;
但我得到了以下错误:
<file script.pig, line 35, column 69> Invalid field projection. Projected field [CNT] does not exist in schema;
我怎样才能解决这个问题?
非常感谢!
2条答案
按热度按时间lndjwyie1#
nbysray52#
将字段\u a更改为chararray并使用了'\t'文件,我对下面的许多语句的解决方案印象不深,但它是有效的,