我是新来的。我在做下面的例子时被卡住了。有谁能帮助我如何使用pigscript获得下面指定的输出吗?
输入:
1|ABC|NC
1|DEF|NC
2|CFD|NY
2|CGF|NY
输出:
1|ABC,DEF|NC
2|CFD,CGF|NY
脚本:
A = LOAD 'testfile.txt' USING PigStorage('|') AS (Id:chararray,name:chararray,state:chararray);
B = FOREACH A GENERATE Id,name;
C = FOREACH A GENERATE Id,name,state;
C = DISTINCT C;
GROUPED = GROUP B BY Id;
D = FOREACH GROUPED GENERATE group AS Id,c.name AS name_val;
E = JOIN D BY Id, C BY Id;
X = FOREACH E GENERATE D.Id as docid,D.name_val as termid,C.state;
Dump X;
2条答案
按热度按时间2skhul331#
加载数据并按第1列和第3列分组,然后生成列,以获得所需的输出。
nzrxty8p2#