REGISTER /tmp/datafu-1.2.0.jar;
DEFINE BagSplit datafu.pig.bags.BagSplit();
A = LOAD 'input.txt' USING PigStorage(',') AS(f1,f2,f3,f4);
B = GROUP A ALL;
C = FOREACH B GENERATE FLATTEN(BagSplit(2,$1)) AS mybag;
D = FOREACH C GENERATE FLATTEN(STRSPLIT(REPLACE(BagToString(mybag),'_null_null_null_null',''),'_',4));
E = FOREACH D GENERATE $2,$3,$0,$1;
DUMP E;
1条答案
按热度按时间r6hnlfcb1#
用本地Pig很难解决这个问题。一个选择是下载
datafu-1.2.0.jar
并尝试以下方法。输入文件
Pig手稿:
输出:
注:基于上述输入格式,我假设第一行最后两列为空,第二行前两列为空,第三行和第四行也一样