首先我有两个数据文件。
largefile.txt文件:
1001 {(1,-1),(2,-1),(3,-1),(4,-1)}
smallfile.txt文件:
1002 {(1,0.04),(2,0.02),(4,0.03)}
我想要smallfile.txt如下:
1002 {(1,0.04),(2,0.02),(3,-1),(4,0.03)}
什么类型的连接我可以做这样的事情?
A = LOAD './largefile.txt' USING PigStorage('\t') AS (id:int, a:bag{tuple(time:int,value:float)});
B = LOAD './smallfile.txt' USING PigStorage('\t') AS (id:int, b:bag{tuple(time:int,value:float)});
1条答案
按热度按时间tyg4sfes1#
你能澄清一下你的要求吗?是否要在largefile.txt和smallfile.txt的第一列/字段上以相同的值联接(例如1002)。如果是这种情况,您可以简单地执行以下操作:-
a=使用pigstorage('\t')as(id:int,a:bag{tuple)加载'./largefile.txt'(time:int,value:float)});
a=为每个a生成id,将(a)展平为时间、值;
b=使用pigstorage('\t')as(id:int,b:bag{tuple)加载'./smallfile.txt'(time:int,value:float)});
b=foreach b生成id,展平(b)为时间、值;
joined=按a.id连接a,按b.id连接b;