pig:求和除法,创建一个对象

bz4sfanl  于 2021-06-25  发布在  Pig
关注(0)|答案(2)|浏览(441)

我正在编写一个pig程序,它加载一个用tab分隔实体的文件
例如:名称选项卡年份选项卡计数选项卡。。。

file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);

-- Group by type
grouped = GROUP file BY type;

-- Flatten
by_type = FOREACH grouped GENERATE FLATTEN(group) AS (type, year, match_count, volume_count);

group_operat = FOREACH by_type GENERATE  
        SUM(match_count) AS sum_m,
        SUM(volume_count) AS sum_v,
       (float)sum_m/sm_v;

DUMP group_operat;

问题在于我试图创建的组操作对象。我要求所有匹配计数的和,求所有体积计数的和,然后用体积计数除以匹配计数。
在我的算术运算/对象创建中我做错了什么?我收到的一个错误是第7行第11列>pig脚本验证失败:org.apache.pig.impl.logicallayer.frontendexception:错误1031:不兼容的架构:左边是“type:null,year:null,匹配_count:null,卷_count:null“右是”group:chararray"
谢谢您。

h6my8fg2

h6my8fg21#

试试这个,

file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);

grouped = GROUP file BY (type,year);

group_operat = FOREACH grouped GENERATE group,
        SUM(file.match_count) AS sum_m,
        SUM(file.volume_count) AS sum_v,
       (float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;

上面的脚本给出了按类型和年份分组的结果,如果您只想按类型分组,则从分组中删除

grouped = GROUP file BY type;

group_operat = FOREACH grouped GENERATE group,file.year,
        SUM(file.match_count) AS sum_m,
        SUM(file.volume_count) AS sum_v,
       (float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;
cunj1qz1

cunj1qz12#

像这样尝试,这将返回type和sum。
更新了工作代码
输入文件

A       2001     10      2
A       2002     20      3
B       2003     30      4
B       2004     40      1

Pig手稿:

file = LOAD 'input.txt' USING PigStorage() AS (type: chararray, year: chararray,
match_count: float, volume_count: float);
grouped = GROUP file BY type;
group_operat = FOREACH grouped {
                                 sum_m = SUM(file.match_count);
                                 sum_v = SUM(file.volume_count);
                                 GENERATE group,(float)(sum_m/sum_v) as sum_mv;
                                }
DUMP group_operat;

输出:

(A,6.0)
(B,14.0)

相关问题