pig过滤器

nzkunb0c  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(542)

我正在尝试筛选位置变量。

X = FILTER C BY($14 matches '.*USD.*');
STORE X into '$output' using PigStorage(',');

上面的语句不起作用,但如果我尝试只输出14美元

E = FOREACH C GENERATE FLATTEN($14);
STORE C into '$output' using PigStorage(',');

很好用
样本数据:

304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,USD120
304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,USD0
304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,GBP0

样本输出

304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,USD0
304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,GBP0
vohkndzv

vohkndzv1#

在“by”和“(”之间添加空格

X = FILTER C BY (FLATTEN($14) matches '.*USD.*');
    STORE X into '$output' using PigStorage(',');
pbossiut

pbossiut2#

你的意见对我有用:

A = LOAD 'StackFile.txt'  using PigStorage(',');
B = FILTER A BY ($14 matches '.*USD.*');
DUMP B;

块引用
304a285281be,138302792890968764,收款人,10c,655362,c2,811289,1,0,0美元,易趣付款,抄送,抄送,2659美元,120304A285281BE,138302792890968764,收款人,10c,655362,c2,811289,1,0,0美元,易趣付款,抄送,2659美元,0美元

相关问题