2, cornflakes, Regular,General Mills, 12
3, cornflakes, Mixed Nuts, Post, 14
4, chocolate syrup, Regular, Hersheys, 5
5, chocolate syrup, No High Fructose, Hersheys, 8
6, chocolate syrup, Regular, Ghirardeli, 6
7, chocolate syrup, Strawberry Flavor, Ghirardeli, 7
脚本
data_grp = GROUP data BY (item, type);
data_cnt = FOREACH data_grp GENERATE FLATTEN (group) AS(item, type), count(data) as total;
filter_data = FILTER data_cnt BY total < 2;
我现在需要应用过滤器的原始数据,我想要的输出是:
4, chocolate syrup, Regular, Hersheys, 5
6, chocolate syrup, Regular, Ghirardeli, 6
1条答案
按热度按时间rdrgkggo1#
过滤数据会给你
chocolate syrup, Regular
。将筛选数据与原始数据集的项联接,键入并获得所需结果。