我有一个按用户分级的电影列表。
{"_id":59607,"title":"King Corn (2007)",
"genres":["Documentary"],
"ratings":[ {"userId":1860,"rating":3},
{"userId":9970,"rating":3.5},
{"userId":16929,"rating":1.5},
{"userId":23473,"rating":4},
{"userId":23733,"rating":4},
{"userId":27584,"rating":3},
{"userId":28232,"rating":4},
{"userId":29482,"rating":3},
{"userId":40976,"rating":5},
{"userId":44631,"rating":4},
{"userId":47613,"rating":3},
{"userId":49763,"rating":3},
{"userId":58160,"rating":4.5},
{"userId":62249,"rating":3},
{"userId":65923,"rating":4},
{"userId":67507,"rating":4},
{"userId":68259,"rating":3.5},
{"userId":70331,"rating":5},
{"userId":71420,"rating":3.5}
]
}
我需要计算一下每个用户都做了多少评分。这是我想进入收视率的尝试。
a = load '/movies_1m.json' using JsonLoader('id:int, title : chararray, genres : { ( genre : chararray ) }, ratings: { ( userId : int, rating: float) } ');
然后
b = FOREACH a GENERATE FLATTEN(ratings);
请描述以下内容:
b: {ratings::userId: int,ratings::rating: float}
只是为了计算我需要访问评级内部的用户数。但这正是它不成功的地方。我试过这个:
c = FOREACH b GENERATE COUNT(ratings);
这让我犯了个错误。
我需要这样的东西:
{userId: int, rating: float}
1条答案
按热度按时间xyhw6mcr1#
你需要
GROUP
为了COUNT
因为这是一个聚合操作。输出
注意,您的示例中没有一个用户重复,因此这些都是一个。