ApachePig和计数

2exbekwf  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(298)

我在想下面的问题。
有多少女性用户提供了至少一个4分的评分。我认为我的连接和过滤器是正确的,但我不能找出不同的计数部分已经尝试了下面的许多版本。

a = load '/user/pig/movie' AS (userid:int, movieid:int, rating:int, timestamp:chararray);
b = load '/user/pig/reviewer' using PigStorage('|') AS (userid:int, age:int, gender:chararray, occupation:chararray, zip:chararray);
a1 = filter a by rating == 4;
b1 = filter b by gender == 'F';
c = join a1 by userid, b1 by userid;
d = FOREACH c GENERATE COUNT(DISTINCT(userid));
dump d;
twh00eeo

twh00eeo1#

你必须在数数前分组。ref:count requires 前面的group all语句用于全局计数,group by语句用于组计数。

d = GROUP c BY userid;
e = FOREACH d GENERATE COUNT(DISTINCT(b1.userid));
dump e;

相关问题