在apache pig中变平

siotufzp  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(233)

我的数据集如下所示:

DUMP A;
(10000,({(10000),(20000),(50000)},{(10000),(20000),(30000)}))
(20000,({(10000),(20000),(50000)},{(20000)},{(10000),(20000),(30000)}))
(30000,({(30000)},{(10000),(20000),(30000)}))
(40000,({(40000)},{(40000),(50000)}))
(50000,({(40000),(50000)},{(10000),(20000),(50000)}))
DESCRIBE A;
{foo: bytearray, bar_gp: (baz: {(foo: bytearray)})}

我最终希望它看起来像这样:

DUMP A;
(10000,{(10000),(20000),(50000),(30000)})
(20000,{(10000),(20000),(50000),(30000)})
(30000,{(10000),(20000),(30000)})
(40000,{(40000),(50000)})
(50000,{(40000),(50000),(10000),(20000)})

如果我尝试使用:

B = FOREACH A GENERATE $0, FLATTEN($1);
C = FOREACH B {D = FOREACH B GENERATE FLATTEN($1); D= DISTINCT D; GENERATE $0, D; }

但我一直得到一个错误:

expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)

如何获得所需的输出?我知道我可以使用自定义项来解析它,但我想找到一个内置的解决方案。

ar7v8xwq

ar7v8xwq1#

我想你需要在把袋子弄平之前把它弄清楚。

B = FOREACH A {
   D = DISTINCT $1;
   GENERATE $0, FLATTEN(D)}

相关问题