我有两个数据集,一个是电影,另一个是收视率
电影数据看起来像
MovieID#Title#Genre
1#Toy Story (1995)#Animation|Children's|Comedy
2#Jumanji (1995)#Adventure|Children's|Fantasy
3#Grumpier Old Men (1995)#Comedy|Romance
评级数据看起来像
UserID#MovieID#Ratings#RatingsTimestamp
1#1193#5#978300760
1#661#3#978302109
1#914#3#978301968
我的剧本如下
1) movies_data = LOAD '/user/admin/MoviesDataset/movies_new.dat' USING PigStorage('#') AS (movieid:int,
moviename:chararray,moviegenere:chararray);
2) ratings_data = LOAD '/user/admin/RatingsDataset/ratings_new.dat' USING PigStorage('#') AS (Userid:int,
movieid:int,ratings:int,timestamp:long);
3) moviedata_ratingsdata_join = JOIN movies_data BY movieid, ratings_data BY movieid;
4) moviedata_ratingsdata_join_group = GROUP moviedata_ratingsdata_join BY movies_data.movieid;
5) moviedata_ratingsdata_averagerating = FOREACH moviedata_ratingsdata_join_group GENERATE group,
AVG(moviedata_ratingsdata_join.ratings) AS Averageratings, (moviedata_ratingsdata_join.Userid) AS userid;
6) DUMP moviedata_ratingsdata_averagerating;
我得到这个错误
2017-03-25 06:46:50,332 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: moviedata_ratingsdata_join_group: Local Rearrange[tuple]{int}(false) - scope-95 Operator Key: scope-95): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: moviedata_ratingsdata_averagerating: New For Each(false,false)[bag] - scope-83 Operator Key: scope-83): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1,Toy Story (1995),Animation|Children's|Comedy), 2nd :(2,Jumanji (1995),Adventure|Children's|Fantasy) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be "foo::bar" )
如果删除第6行,脚本将成功执行
为什么我不能转储第5行中生成的关系?
1条答案
按热度按时间snvhrwxg1#
使用消歧运算符(
::
)在JOIN
,COGROUP
,CROSS
,或FLATTEN
操作员。关系
movies_data
以及ratings_data
两者都有一列movieid
. 形成关系时moviedata_ratingsdata_join_group
,使用::
运算符来标识哪个列movieid
用于GROUP
.所以你的
4)
看起来像,