访问pig中类似数组的元素

olmpazwi  于 2021-06-24  发布在  Pig
关注(0)|答案(2)|浏览(241)

我的数据格式是:id,val1,val2
例子

1,0.2,0.1
1,0.1,0.7
1,0.2,0.3
2,0.7,0.9
2,0.2,0.3
2,0.4,0.5

首先我想按val1的降序对每个id进行排序

1,0.2,0.1
1,0.2,0.3
1,0.1,0.7
2,0.7,0.9
2,0.4,0.5
2,0.2,0.3

然后为每个id选择第二个元素id val2组合,例如:

1,0.3
  2,0.5

我该如何处理这个问题?
谢谢

bfhwhh0e

bfhwhh0e1#

pig是一种脚本语言,不像sql那样是关系语言,它非常适合使用foreach中嵌套了运算符的组。以下是解决方案:

A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:float, v2:float);
B = GROUP A BY id; -- isolate all rows for the same id
C = FOREACH B { -- here comes the scripting bit
    elems = ORDER A BY v1 DESC; -- sort rows belonging to the id
    two = LIMIT elems 2; -- select top 2
    two_invers = ORDER two BY v1 ASC; -- sort in opposite order to bubble second value to the top
    second = LIMIT two_invers 1;
    GENERATE FLATTEN(group) as id, FLATTEN(second.v2);
};
DUMP C;

在您的示例中,id1有两行v1==0.2,但v2不同,因此id1的第二个值可以是0.1或0.3

dbf7pr2w

dbf7pr2w2#

A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:int, v2:int);
B = ORDER A BY id ASC, v1 DESC;
C = FOREACH B GENERATE id, v2;
DUMP C;

相关问题