删除数据包中的元组

v6ylcynt  于 2021-05-29  发布在  Hadoop
关注(0)|答案(3)|浏览(328)

以下是我的问题代码:

a = LOAD 'tellers' using TextLoader() AS line;
 # convert a to charrarry
 b = foreach a generate (chararray)line;  
 # run through my UDF to create tuples
 c = foreach b generate myudfs.TellerParser5(line);  # ({(20),(5),(5),(10)(1),(1),(1),(1),(1),(5),(10),(10),(10)})....
 d = foreach c generate flatten(number); 
 e = group d by number; #{group: chararray,d: {(number: chararray)}}
 f = foreach e generate group, COUNT(d);  # f: {group: chararray,long}

在databagf中,我有一个空元组(,1),我想过滤/删除。

dump f;
 (,1)
 (1,97)
 (5,49)
 (10,87)
 (20,24)

 describe f;
 f: {group: chararray,long}

我尝试过但没有成功(没有改变):

remove_tuple = filter f BY group is not null;
x0fgdtte

x0fgdtte1#

这群人是头Pig keyword . 希望在元组名称中使用其他单词时,这种方法也能起作用。

wkftcu5l

wkftcu5l2#

我通过添加一个step并将其转换为int来解决这个问题

e = foreach d generate (int)$0; # this is the key added step

 f = group e by number; #{group: chararray,d: {(number: chararray)}}
 g = foreach f generate group, COUNT(e);  # f: {group: chararray,long}
 h = foreach f generate group, SUM(e);
 i = filter g by $0 is not null; 
 dump i; 
 (1,97)
 (5,49)
 (10,87)
 (20,24)
k97glaaz

k97glaaz3#

空值可以使用 !='null' 作为条件。我把下面的内容作为输入。

(,1)
(1,97)
(5,49)
(10,87)
(20,24)

下面是我们如何过滤空的。

A = LOAD 'file' using PigStorage(',') AS (a:chararray,b:long);
B = FILTER A BY a!='null';
DUMP B;

所以对于你的剧本来说

remove_tuple = filter f BY group!='null';

输出:

(1,97)
(5,49)
(10,87)
(20,24)

相关问题