写入Map器输出时发生nullpointerexception

yr9zkbsy  于 2021-06-03  发布在  Hadoop
关注(0)|答案(0)|浏览(278)

我正在使用亚马逊电子病历。我的map/reduce作业配置如下:

private static final String TEMP_PATH_PREFIX = System.getProperty("java.io.tmpdir") + "/dmp_processor_tmp";
...
private Job setupProcessorJobS3() throws IOException, DataGrinderException {
    String inputFiles = System.getProperty(DGConfig.INPUT_FILES);
    Job processorJob = new Job(getConf(), PROCESSOR_JOBNAME);
    processorJob.setJarByClass(DgRunner.class);
    processorJob.setMapperClass(EntityMapperS3.class);
    processorJob.setReducerClass(SelectorReducer.class);
    processorJob.setOutputKeyClass(Text.class);
    processorJob.setOutputValueClass(Text.class);
    FileOutputFormat.setOutputPath(processorJob, new Path(TEMP_PATH_PREFIX));
    processorJob.setOutputFormatClass(TextOutputFormat.class);

    processorJob.setInputFormatClass(NLineInputFormat.class);
    FileInputFormat.setInputPaths(processorJob, inputFiles);
    NLineInputFormat.setNumLinesPerSplit(processorJob, 10000);

    return processorJob;
}

在我的mapper课程中,我有:

private Text outkey = new Text();
private Text outvalue = new Text();
...
outkey.set(entity.getEntityId().toString());
outvalue.set(input.getId().toString());
printLog("context write");
context.write(outkey, outvalue);

这最后一行( context.write(outkey, outvalue); ),导致此异常。当然两者都有 outkey 以及 outvalue 不为空。

2013-10-24 05:48:48,422 INFO com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Current Thread: Thread[main,5,main]Current timestamp: 1382593728422 context write
2013-10-24 05:48:48,422 ERROR com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Error on entitymapper for input: 03a07858-4196-46dd-8a2c-23dd824d6e6e
java.lang.NullPointerException
    at java.lang.System.arraycopy(Native Method)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1293)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1210)
    at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
    at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264)
    at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244)
    at org.apache.hadoop.io.Text.write(Text.java:281)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1077)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:698)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at com.s1mbi0se.grinder.core.mapred.EntityMapper.map(EntityMapper.java:78)
    at com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:34)
    at com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:14)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-10-24 05:48:48,422 INFO com.s1mbi0se.grinder.core.mapred.EntityMapperS3 (main): Current Thread: Thread[main,5,main]Current timestamp: 1382593728422 Entity Mapper end

每个任务的第一个记录都刚刚处理好。在任务处理的某个阶段,我开始一遍又一遍地处理这个异常,然后它就不再处理该任务的单个记录了。
我试着 TEMP_PATH_PREFIX"s3://mybucket/dmp_processor_tmp" ,但同样的事情发生了。
知道为什么会这样吗?是什么让hadoop不能在它的输出上写呢?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题