我可以保留加入Pig不匹配的项目吗?

mdfafbf1  于 2021-06-24  发布在  Pig
关注(0)|答案(2)|浏览(301)

我有两套

personCounts 
(personName:chararray, count:int)

whitelist
(empID:int, empName:chararray)

我想要的是在个人名单里的人,而不是在白名单里的人。我知道join返回出现在两者中的元素。有没有办法退回那些会被丢弃的?我在想我可以和克罗斯一起做,但是我想我会有额外的演员。。?

crossed = CROSS personCounts BY personName, whitelist BY empName;
filcrs = FILTER crossed BY NOT personCounts::personName MATCHES whitelist::empName;
jyztefdp

jyztefdp1#

我想你想要达到的是设置personcounts和whitelist之间的差异,对吗?
如果是,请尝试以下操作(未测试!!!):

CGRP = COGROUP personCounts BY personName, whitelist BY empName;
PC_MINUS_WL = FILTER CGRP BY IsEmpty(whitelist);
PC_MINUS_WL = FOREACH PC_MINUS_WL GENERATE group AS name;

我发现以下两个资源很有用:
http://agiletesting.blogspot.de/2012/02/set-operations-in-apache-pig.html
http://www.cs.tufts.edu/comp/150cpa/notes/advanced_pig.pdf

gcxthw6b

gcxthw6b2#

您可以使用完全联接来完成此操作。

joined = JOIN personCounts BY personName FULL, whitetlist BY empName;
joined = FILTER joined BY NOT $0 MATCHES '';
joined = FILTER joined BY $3 IS null;

然后加入is(personname,count,“”)

相关问题