hadoop map reduce索引超出界限

d6kp6zgx  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(472)

对于较少的输入,我的程序运行得很好,但是当我增加输入的大小时,似乎第210行(context.nextkeyvalue();)引发indexoutofbounds异常。下面是Map器的设置方法。我在其中调用nextkeyvalue一次,因为每个文件的第一行是一个头。由于标头的原因,拆分文件设置为false。这和记忆有关吗?如何解决这个问题?
此外,下面的错误消息显示68次,即使我已将maxmapattempt设置为3。顺便说一下,一共有55个裂缝。是不是应该显示55次或者55*3?或者只有3个?它是如何工作的?

@Override
    protected void setup(Context context) throws IOException, InterruptedException
    {
        Configuration conf = context.getConfiguration();
        DupleSplit fileSplit = (DupleSplit)context.getInputSplit();
        //first line is header. Indicates the first digit of the solution. 
        context.nextKeyValue(); <---- LINE 210
        URI[] uris = context.getCacheFiles();

        int num_of_colors = Integer.parseInt(conf.get("num_of_colors"));
        int order = fileSplit.get_order();
        int first_digit = Integer.parseInt(context.getCurrentValue().toString());

        //perm_path = conf.get(Integer.toString(num_of_colors - order -1));
        int offset = Integer.parseInt(conf.get(Integer.toString(num_of_colors - order -1)));
        uri = uris[offset];
        Path perm_path = new Path(uri.getPath());
            perm_name = perm_path.getName().toString();

        String pair_variables = "";
        for (int i=1; i<=num_of_colors; i++)
            pair_variables += "X_" + i + "_" + (num_of_colors - order) + "\t";
        for (int i=1; i<num_of_colors; i++)
            pair_variables += "X_" + i + "_" + (num_of_colors - order - first_digit) + "\t";
        pair_variables += "X_" + num_of_colors + "_" + (num_of_colors - order - first_digit);
        context.write(new Text(pair_variables), null);
    }

错误日志如下:

Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkBounds(Buffer.java:559)
at java.nio.ByteBuffer.get(ByteBuffer.java:668)
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:279)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:168)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at produce_data_hdfs$input_mapper.setup(produce_data_hdfs.java:210)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
5jdjgkvh

5jdjgkvh1#

我以前从未见过调用此方法,而且似乎您甚至不需要它,因为您不将其结果存储在任何变量中。
为什么不跳过map()方法中的第一个键、值对呢?你可以很容易地做到这一点,有一个计数器,初始化为0从设置方法和增加它在Map的开始。然后,当此计数器等于1时,跳过Map计算:

int counter;

setup(){
   counter = 0;
   ...
}

map() {
    if (++counter == 1) {
        return;
    }
    ... //your existing map code goes here
}

错误消息显示了68次,可能是因为对于可以同时运行的每个Map任务(与集群中可用的Map槽数量相同)显示一次,然后重新执行这些任务(每个任务两次),直到其中一些任务失败,导致整个作业失败(在整个作业失败之前可以失败的任务数有一个阈值)。

j8yoct9x

j8yoct9x2#

我知道这已经晚了几年,但是对于任何看到这一点的人来说,hadoop2.6有一个从long到int的不安全类型转换,这在很多情况下导致了ioob异常。我相信这个补丁是2.7.3版本发布的。你可以在网上看到https://issues.apache.org/jira/browse/mapreduce-6635. 我希望这能帮助任何遇到这个问题的人。

相关问题