apache pig,elephantbirdjson加载程序

dm7nw8vv  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(245)

我试图使用elephantbird json加载程序解析下面的输入(这个输入中有2条记录)
[{“node_disk_lnum_1”:36,“node_disk_xfers_in_rate_sum”:136.40000000000001,“node_disk_bytes_in_rate_22”:187392.0,“node_disk_lnum_7”:13}]
[{“node_disk_lnum_1”:36,“node_disk_xfers_in_rate_sum”:105.2,“node_disk_bytes_in_rate_22”:123084.8,“node_disk_lnum_7”:13}]
以下是我的语法:

register '/home/data/Desktop/elephant-bird-pig-4.1.jar';

a = LOAD '/pig/tc1.log' USING 
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);

b = FOREACH a GENERATE flatten(json#'node_disk_lnum_1') AS 
node_disk_lnum_1,flatten(json#'node_disk_xfers_in_rate_sum') AS 
node_disk_xfers_in_rate_sum,flatten(json#'node_disk_bytes_in_rate_22') AS 
node_disk_bytes_in_rate_22, flatten(json#'node_disk_lnum_7') AS
node_disk_lnum_7;

DESCRIBE b;

b描述结果:
b:{节点磁盘1:bytearray,节点磁盘7:bytearray}

c = FOREACH b GENERATE node_disk_lnum_1;

DESCRIBE c;

c:{node\u disk\u lnum\u 1:bytearray}

DUMP c;

预期结果:
36, 136.40000000000001, 187392.0, 13
36, 105.2, 123084.8, 13
抛出下面的错误
2017-02-06 01:05:49337[main]info org.apache.pig.tools.pigstats.scriptstate-脚本中使用的pig功能:未知2017-02-06 01:05:49386[main]info org.apache.pig.data.schematuplebackend-未设置键[pig.schematuple]。。。不会生成代码。2017-02-06 01:05:49387[main]info org.apache.pig.newplan.logical.optimizer.logicalplanoptimizer-{rules\u enabled=[addforeach,columnmapkeyprune,constantcalculator,groupbyconstparallelsetter,limitoptimizer,loadtypecastinerter,mergefilter,mergeforeach,partitionfilteroptimizer,predicatepushdownownoptimizer,pushdownforeachflatten,pushupfilter,splitfilter,streamtypecastinerter]}2017-02-06 01:05:49390[main]info org.apache.pig.newplan.logical.rules.columnprunevisitor-Map键:$0->[node\u disk\u lnum\u 1,node\u disk\u xfers\u in \u rate\u sum,node\u disk\u bytes\u in \u rate\u 22,node\u disk\lnum\u 7]
2017-02-06 01:05:49395[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mrcompiler-文件连接阈值:100?false 2017-02-06 01:05:49398[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer-优化前mr计划大小:1 2017-02-06 01:05:49398[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer-优化后mr计划大小:1 2017-02-06 01:05:49,425[main]info org.apache.pig.tools.pigstats.mapreduce.mrscriptstate-pig脚本设置已添加到作业2017-02-06 01:05:49426[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler-mapred.job.reduce.markreset.buffer.percent未设置,设置为默认值0.3 2017-02-06 01:05:49,428[main]error org.apache.pig.tools.grunt.grunt-错误2998:未处理的内部错误。com/twitter/elephantbird/util/hadoopcompat
请帮帮我,我错过了什么?

zkure5ic

zkure5ic1#

json中没有任何嵌套数据,因此请删除-nestedload

a = LOAD '/pig/tc1.log' USING com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]);

相关问题