pig脚本来连接元组中的值

ct2axkht  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(528)

输入:

(11111111,{(A,MARK,APPLE,ABC1,11111111),(B,PAUL,AMAZON,ABC2,11111111),(C,TIM,FIVN,ABC3,11111111),(D,LIN,MULESFT,ABC4,11111111),(E,YEP,UHG,ABC5,11111111),(F,QIN,ATT,ABC6,11111111)})
(22222222,{(A,MARK,APPLE,ABC6,22222222),(B,MARK,AMAZON,ABC7,22222222),(C,MARK,PQE,ABC8,22222222),(D,MARK,AMB,ABC9,22222222),(E,MARK,YZQ,ABC19,22222222),(F,MARK,PQR,,22222222)})

我用上面的键将数据分组。我应该通过连接元组的所有值(包括空值)来生成输出,如下所示:
输出:

(1111111,A^B^C^D^E^F,MARK^PAUL^TIM^LIN^YEP^QIN,APPLE^AMAZON^FIVN^MULESFT^UHG^ATT,ABC1^ABC2^ABC3^ABC4^ABC5^^ABC6)
(2222222,A^B^^D^E^G,TIM^AIN^TIM^BIN^CIN^DIN^RIN,APPLE^AMAZON^PQE^AMB^YZQ^RIN,ABC6^ABC7^ABC8^ABC9^ABC19^^)

有人能帮我吗?

bogh5gae

bogh5gae1#

分享一段可能有帮助的代码片段,在此基础上努力实现预期的输出。
输入:

1,A
1,B
1,C
2,D
2,E
2,F

输出:

(1,C^B^A)
(2,F^E^D)

pig代码段:

data1 = load '/Users/muralirao/learning/pig/a.csv' using PigStorage(',') as (id:int, name:chararray);
req_data = FOREACH (GROUP data1 BY id) { 
    names = data1.name;
    GENERATE group AS id, BagToString(names,'^');  
};

DUMP req_data;

相关问题