如何获得pig中一组字段的不同值?

ykejflvf  于 2021-06-21  发布在  Pig
关注(0)|答案(2)|浏览(319)

是否有可能在清管器中获得以下输出?我能按第一和第二字段分组,然后按第三字段进行区分吗?

For example
I have input data

12345|9658965|52145
12345|9658965|52145
12345|9658965|52145
23456|8541232|96589
23456|8541232|96585

 I want output something like

    12345|9658965|52145
    23456|8541232|96589
    23456|8541232|96585
uttx8gqw

uttx8gqw1#

试试这个,很相似:

A = LOAD 'test.csv' USING PigStorage('|') as (a1,a2,a3);
    unique  =
        FOREACH (GROUP A BY a3) {
            b = A.(a1,a2);
            s = DISTINCT b;
            GENERATE FLATTEN(s), group AS a4;
        };
yebdmbv4

yebdmbv42#

方法1:使用distinct
裁判:http://pig.apache.org/docs/r0.12.0/basic.html#distinct
distinct运算符应该有帮助

test = LOAD 'test.csv' USING PigStorage('|');
distinct_recs = DISTINCT test;
DUMP distinct_recs;

方法2:按所有字段分组

test = LOAD 'test.csv' USING PigStorage('|');
grp_all_fields = GROUP test BY ($0,$1,$2);
uniq_recs = FOREACH grp_all_fields GENERATE FLATTEN(group);
DUMP uniq_recs;

这两种方法都给出了共享输入的预期输出。

相关问题