创建具有大(>1gb)字节可写值大小的sequencefile时出现negativearraysizeexception

kxkpmulp 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(430)

我尝试了不同的方法来创建一个大型hadoop sequencefile，只需一个短（<100bytes）键，但一个大（>1gb）值（byteswriteable）。
以下示例适用于开箱即用：
https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/bigmapoutput.java
它写入多个随机长度的键和值，总大小大于3gb。
然而，这不是我想要做的。所以我使用hadoop 2.2.0 api将其修改为：

Path file = new Path("/input");
      SequenceFile.Writer writer = SequenceFile.createWriter(conf,
      SequenceFile.Writer.file(file),
      SequenceFile.Writer.compression(CompressionType.NONE),
      SequenceFile.Writer.keyClass(BytesWritable.class),
      SequenceFile.Writer.valueClass(BytesWritable.class));
      int numBytesToWrite = fileSizeInMB * 1024 * 1024;
      BytesWritable randomKey = new BytesWritable();
      BytesWritable randomValue = new BytesWritable();
      randomKey.setSize(1);
      randomValue.setSize(numBytesToWrite);
      randomizeBytes(randomValue.getBytes(), 0, randomValue.getLength());
      writer.append(randomKey, randomValue);
      writer.close();

当filesizeinmb>700mb时，会出现如下错误：

java.lang.NegativeArraySizeException
        at  org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
        at  org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
        ...

我看到这个错误正在讨论中，但没有看到任何解决办法。请注意，int（2^32）可以大到2gb，它不应该在700mb时失败。
如果您有其他选择来创建如此大的值sequencefile，请告知。我尝试了其他方法，比如ioutils.read from inputstream into a byte[]，我得到了堆大小或oome。

hadoop sequencefile out-of-memory heap large-files

来源：https://stackoverflow.com/questions/24127304/negativearraysizeexception-when-creating-a-sequencefile-with-large-1gb-bytesw

2条答案

按热度按时间

nkkqxpd91#

只需使用ArrayPrimitiveWriteable即可。
在byteswritable中设置新容量会导致int溢出：

public void setSize(int size) {
    if (size > getCapacity()) {
       setCapacity(size * 3 / 2);
    }
    this.size = size;
}

700 mb*3>2gb=int溢出！
因此，您不能反序列化（但可以写入和序列化）超过700 mb的字节可写。

赞(0）回复(0）举报 2021-06-03

zsohkypk2#

如果你想用 BytesWritable ，之前有一个选项将容量设置得足够高，因此您可以使用2gb，而不仅仅是700mb：

randomValue.setCapacity(numBytesToWrite);
randomValue.setSize(numBytesToWrite); // will not resize now

这个bug最近在hadoop中得到了修复，因此在较新的版本中，即使没有这个bug，它也可以工作：

public void setSize(int size) {
  if (size > getCapacity()) {
    // Avoid overflowing the int too early by casting to a long.
    long newSize = Math.min(Integer.MAX_VALUE, (3L * size) / 2L);
    setCapacity((int) newSize);
  }
  this.size = size;
}

赞(0）回复(0）举报 2021-06-03

我来回答

创建具有大(>1gb)字节可写值大小的sequencefile时出现negativearraysizeexception

2条答案

相关问题

热门标签

最新问答