应用于每个组属性的pig拉丁限制运算符

juud5qan  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(328)

我试图只返回五个最大的地方根据人口在每个州。我还试图按州名对结果进行排序,每个州的地方都按人口降序排列。我现在只给了我前五个州的排名,而不是每个州的前五名。

-- Groups places by state name.
group_by_state_name_populated_place_name =
    GROUP project_using_state_name
    BY (state::name, place::name);

-- Counts population for each place in every state.
count_population_for_each_place_in_every_state =
    FOREACH group_by_state_name_populated_place_name
    GENERATE group.state::name AS state_name,
             group.place::name AS name,
             COUNT(project_using_state_name.population) AS population;

-- Orders population in each group found above to enable the use of limit.
order_groups_of_states_and_population =
    ORDER count_population_for_each_place_in_every_state 
    BY state_name ASC, population DESC, name ASC;

-- Limit the top 5 population for each state BUT currently returning just the first 5 tuples of the previous one and not 5 of each state.
limit_population =
    LIMIT order_groups_of_states_and_population 5;
zwghvu4y

zwghvu4y1#

下面的代码片段可能会有所帮助

inp_data = load 'input_data.csv' using PigStorage(',') AS (state:chararray,place:chararray,population:long);

req_stats = FOREACH(GROUP inp_data BY state) {
    ordered = ORDER inp_data BY population DESC;
    required = LIMIT ordered 5;
    GENERATE FLATTEN(required);
};

req_stats_ordered = ORDER req_stats BY state, population DESC;

DUMP req_stats_ordered;

相关问题