如何在pig中加入bag

j5fpnvbx  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(372)

首先我有两个数据文件。
largefile.txt文件:

1001    {(1,-1),(2,-1),(3,-1),(4,-1)}

smallfile.txt文件:

1002    {(1,0.04),(2,0.02),(4,0.03)}

我想要smallfile.txt如下:

1002    {(1,0.04),(2,0.02),(3,-1),(4,0.03)}

什么类型的连接我可以做这样的事情?

A = LOAD './largefile.txt' USING PigStorage('\t') AS (id:int, a:bag{tuple(time:int,value:float)});

B = LOAD './smallfile.txt' USING PigStorage('\t') AS (id:int, b:bag{tuple(time:int,value:float)});
tyg4sfes

tyg4sfes1#

你能澄清一下你的要求吗?是否要在largefile.txt和smallfile.txt的第一列/字段上以相同的值联接(例如1002)。如果是这种情况,您可以简单地执行以下操作:-
a=使用pigstorage('\t')as(id:int,a:bag{tuple)加载'./largefile.txt'(time:int,value:float)});
a=为每个a生成id,将(a)展平为时间、值;
b=使用pigstorage('\t')as(id:int,b:bag{tuple)加载'./smallfile.txt'(time:int,value:float)});
b=foreach b生成id,展平(b)为时间、值;
joined=按a.id连接a,按b.id连接b;

相关问题