apache pig中的count/sum

n3h0vuf2  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(410)

我是ApachePig的初学者。有一个包含以下字段的表:

table - amount:long date:string country:string

最初,我的目标是得到每个国家的油田数量,每月。例如,这将是我要求的最终结果:

(Exhibit A)
201201 USA 100
201201 UK 150
201305 ITALY 200
201305 USA 120
201305 UK 20
201403 ITALY 300

数字100150200300代表所有国家每个日期的数量。达到了上述预期效果。

data = ORDER table BY date ASC;

data1 = GROUP data BY (date, country);

countof_amount = FOREACH data1 GENERATE
             FLATTEN(group) AS (date, country),
             COUNT(data) AS amount_count;

countof_amount1 = order countof_amount by date ASC;

现在,我想找出所有国家/地区每个日期的所有金额计数的总和,例如,从附件a中,我希望得到以下结果:

201201 250
201305 240
201403 300

我该怎么做呢?
提前谢谢!

pxiryf3j

pxiryf3j1#

加上最后三行就行了。我在当地测试过,效果很好。

table = LOAD 'input.txt' using PigStorage(' ') as(amount:long,date:chararray,country:chararray);  
data = ORDER table BY date ASC;  
data1 = GROUP data BY (date,country);  
countof_amount = FOREACH data1 GENERATE 
            FLATTEN(group) AS (date, country),  
           COUNT(data.amount) AS (amount_count);  
countof_amount1 = order countof_amount by date ASC;  

mycount =  group countof_amount1 by date;  
getFinalCount = FOREACH mycount  GENERATE group as date,SUM(countof_amount1.amount_count) as total;  
dump getFinalCount;

相关问题