如何在pig中求和2个日志文件

vh0rcniy  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(457)

我有问题,和2个日志文件。
示例文件:
文件-1
id用户视图
1 aaa 2个
2个bbb 5
3 ccc 9号
文件-2
id用户视图地址
1个aaa 5个
2 bbb 2年
6个fff 4个zzz
我要sum两个文件,按id和sum(视图),希望输出:
输出:

id user view address
1  AAA  7    XXX
2  BBB  7    YYY

我应该尝试代码连接两个文件,但我不求两个文件的和:
我的代码:

inputdata = LOAD '/user/hdfs/tes/part-1' AS (
    id:chararray, 
    user:chararray, 
    view:int
);

inputdata2 = LOAD '/user/hdfs/tes/part-2' AS (
    id:chararray, 
    user:chararray, 
    view:int,
    address:chararray
);

joined = JOIN inputdata BY id LEFT OUTER, inputdata2 by id;

outputlist = FOREACH joined {

        GENERATE
        inputdata::id, 
        inputdata::user, 
        --sum(inputdata2::view), 
        inputdata2::address;

}

dump outputlist;

iam问题,如何在两个日志文件中求和视图。??
谢谢。

5anewei6

5anewei61#

获取foreach循环中的连接结果,并对视图值求和。这将起作用。

A = LOAD 'file1.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int);                  
B = LOAD 'file2.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int,d:chararray);      
C = JOIN A by a,B by a;                                                                                                                           
D = FOREACH C GENERATE A::a as id,A::b as user,A::c + B::c as view,B::d as address;

输出:

(1,AAA,7,XXX)
(2,BBB,7,YYY)

相关问题