如何用多个字段连接pig中的两个关系

nimxete2  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(255)

我有两个csv文件:
1-fertilliy.csv:

2-预期寿命.csv:

我想加入他们的Pig,这样的结果将是这样的:

我是新来的Pig,我不能得到正确的答案,但这是我的代码:

fertility = LOAD 'fertility' USING org.apache.hcatalog.pig.HCatLoader();

lifeExpectency = LOAD 'lifeExpectency' USING   org.apache.hcatalog.pig.HCatLoader();

A = JOIN fertility by country, lifeExpectency by country; 

B = JOIN fertility by year, lifeExpectency by year; 

C = UNION A,B;

DUMP C;

下面是我的代码的结果:

taor4pac

taor4pac1#

您有按国家和年份的联接,并选择最终输出所需的必要列。

fertility = LOAD 'fertility' USING org.apache.hcatalog.pig.HCatLoader();
lifeExpectency = LOAD 'lifeExpectency' USING   org.apache.hcatalog.pig.HCatLoader();

A = JOIN fertility by (country,year), lifeExpectency by (country,year); 
B = FOREACH A GENERATE  fertility::country,fertility::year,fertility::fertility,lifeExpectency::lifeExpectency;  
DUMP B;

相关问题