删除数据包中的元组

v6ylcynt 于 2021-05-29 发布在 Hadoop

关注(0)|答案(3)|浏览(328)

以下是我的问题代码：

a = LOAD 'tellers' using TextLoader() AS line;
 # convert a to charrarry
 b = foreach a generate (chararray)line;  
 # run through my UDF to create tuples
 c = foreach b generate myudfs.TellerParser5(line);  # ({(20),(5),(5),(10)(1),(1),(1),(1),(1),(5),(10),(10),(10)})....
 d = foreach c generate flatten(number); 
 e = group d by number; #{group: chararray,d: {(number: chararray)}}
 f = foreach e generate group, COUNT(d);  # f: {group: chararray,long}

在databagf中，我有一个空元组（，1），我想过滤/删除。

dump f;
 (,1)
 (1,97)
 (5,49)
 (10,87)
 (20,24)

 describe f;
 f: {group: chararray,long}

我尝试过但没有成功（没有改变）：

remove_tuple = filter f BY group is not null;

hadoop apache-pig

来源：https://stackoverflow.com/questions/33216308/pig-latin-remove-tuple-in-data-bag

3条答案

按热度按时间

x0fgdtte1#

这群人是头Pig keyword . 希望在元组名称中使用其他单词时，这种方法也能起作用。

赞(0）回复(0）举报 2021-05-30

wkftcu5l2#

我通过添加一个step并将其转换为int来解决这个问题

e = foreach d generate (int)$0; # this is the key added step

 f = group e by number; #{group: chararray,d: {(number: chararray)}}
 g = foreach f generate group, COUNT(e);  # f: {group: chararray,long}
 h = foreach f generate group, SUM(e);
 i = filter g by $0 is not null; 
 dump i; 
 (1,97)
 (5,49)
 (10,87)
 (20,24)

赞(0）回复(0）举报 2021-05-29

k97glaaz3#

空值可以使用 !='null' 作为条件。我把下面的内容作为输入。

(,1)
(1,97)
(5,49)
(10,87)
(20,24)

下面是我们如何过滤空的。

A = LOAD 'file' using PigStorage(',') AS (a:chararray,b:long);
B = FILTER A BY a!='null';
DUMP B;

所以对于你的剧本来说

remove_tuple = filter f BY group!='null';

输出：

(1,97)
(5,49)
(10,87)
(20,24)

赞(0）回复(0）举报 2021-05-29

我来回答

删除数据包中的元组

3条答案

相关问题

热门标签

最新问答