在pig中创建自定义代理密钥

noj0wjuj  于 2021-06-21  发布在  Pig
关注(0)|答案(2)|浏览(344)

有没有办法在pig中创建自定义代理密钥?。
例:我们有如下数据

Salary City Name

20000 newyork john   
30000 sydney joseph   
60000 delhi mike   
30000 sydney joseph

对于这些数据,我们需要创建如下所示的代理键,结果如下所示。

Salary City Name

SCN1 20000 newyork john    
SCN2 30000 sydney joseph   
SCN3 60000 delhi mike  
SCN2 30000 sydney joseph

而不是创建随机唯一键?
提前谢谢!!。

7gyucuyw

7gyucuyw1#

首先对数据进行distinct,使用rank和concat获取每个distinct行的自定义键。然后将distinct与原始数据集联接。最后生成所需的列。

A = LOAD 'data.txt' USING PigStorage('\t');
B = DISTINCT A;
C = RANK B;
D = FOREACH C GENERATE CONCAT('SCN',$0),$1,$2,$3;
E = JOIN A BY ($0,$1,$2),D BY ($1,$2,$3);
F = FOREACH E GENERATE E::$3,E::$0,E::$1,E::$2;
DUMP F;

这就是它如何处理示例数据

20000 newyork john   
30000 sydney joseph   
60000 delhi mike   
30000 sydney joseph

b

20000 newyork john   
30000 sydney joseph   
60000 delhi mike

c

1 20000 newyork john   
2 30000 sydney joseph   
3 60000 delhi mike

d

SCN1 20000 newyork john   
SCN2 30000 sydney joseph   
SCN3 60000 delhi mike

e

20000 newyork john SCN1 20000 newyork john     
30000 sydney joseph SCN2 30000 sydney joseph   
60000 delhi mike SCN3 60000 delhi mike 
30000 sydney joseph SCN2 30000 sydney joseph

f

SCN1 20000 newyork john    
SCN2 30000 sydney joseph   
SCN3 60000 delhi mike  
SCN2 30000 sydney joseph
xe55xuns

xe55xuns2#

感谢好奇的头脑,帮助我在生成唯一的代理键。这里是Pig脚本,我已经测试和工作得很好。

A = LOAD '/user/root5/data3.txt' USING PigStorage(',');
 B = DISTINCT A;
 C = RANK B;
 D = FOREACH C GENERATE CONCAT('SCN',$0),$1,$2,$3;
 E = JOIN A BY ($0,$1,$2),D BY ($1,$2,$3);
 F = FOREACH E GENERATE $3, $0, $1, $2;
 DUMP F;

每一步的输出如下:

DUMP A;
(20000,newyork,john)
(30000,sydney,joseph)
(60000,delhi,mike)
(20000,newyork,john)
(30000,sydney,mike)
(60000,delhi,mike)  

DUMP B;
(20000,newyork,john)
(30000,sydney,mike)
(30000,sydney,joseph)
(60000,delhi,mike)

DUMP C;
(1,20000,newyork,john)
(2,30000,sydney,mike)
(3,30000,sydney,joseph)
(4,60000,delhi,mike)

DUMP D;
(SCN1,20000,newyork,john)
(SCN2,30000,sydney,mike)
(SCN3,30000,sydney,joseph)
(SCN4,60000,delhi,mike)

DUMP E;
(20000,newyork,john,SCN1,20000,newyork,john)
(20000,newyork,john,SCN1,20000,newyork,john)
(30000,sydney,mike,SCN2,30000,sydney,mike)
(30000,sydney,joseph,SCN3,30000,sydney,joseph)
(60000,delhi,mike,SCN4,60000,delhi,mike)
(60000,delhi,mike,SCN4,60000,delhi,mike)

DUMP F;
(SCN1,20000,newyork,john)
(SCN1,20000,newyork,john)
(SCN2,30000,sydney,mike)
(SCN3,30000,sydney,joseph)
(SCN4,60000,delhi,mike)
(SCN4,60000,delhi,mike)'

相关问题