pig脚本在join和group by之后合并行

holgip5t  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(411)

电影表:

id  movie  genre
1   ABC    A|B|C
2   DEF    D|A|F

有多个流派被一个字母分隔开 | 分隔符。
评分表:

user_id  movie_id  rating
1        1         3.5
1        2         4.5

结果:
我希望结果是 user_id +所有流派

user_id  genres
1        (A|B|C|D|A|F)

代码:

genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by (user_id);
user1_data = foreach genre_data generate ratings::user_id, movie::genre;
u0njafvf

u0njafvf1#

您可以通过以下方式实现:

genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by user_id;

user_data = foreach genre_data {
    genres = foreach genre_data generate movie::genre as genres;
    generate group as user_id, BagToString(genres, '|');
};

相关问题