在hadoop程序中,我尝试压缩Map结果,我编写了以下代码:
conf.setBoolean("mapred.compress.map.output",true);
conf.setClass("mapred.map.output.compression.codec",GzipCodec.class,CompressionCodec.class);
运行它,我得到下面的例外,有人知道原因吗?
WARN mapred.LocalJobRunner: job_local1149103367_0001
java.io.IOException: not a gzip file
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:495)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:256)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:185)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:72)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at org.apache.hadoop.mapred.IFile$Reader.positionToNextRecord(IFile.java:400)
at org.apache.hadoop.mapred.IFile$Reader.nextRawKey(IFile.java:425)
at org.apache.hadoop.mapred.Merger$Segment.nextRawKey(Merger.java:323)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:613)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:558)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:70)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:385)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:445)
今天,我再次测试了它,我发现如果在创建job对象之前放置2行,
Job job = new Job(conf, "MyCounter");
错误会发生,如果在那之后,没有错误会发生,为什么会发生这种情况?
1条答案
按热度按时间fnx2tebb1#
您使用的是mrv1还是mrv2。如果您使用的是mrv2,那么请使用以下作业配置。
config.setBoolean("mapreduce.output.fileoutputformat.compress", true); config.setClass("mapreduce.output.fileoutputformat.compress.codec",GzipCodec.class,CompressionCodec.class);
此外,您还可以设置config.set("mapreduce.output.fileoutputformat.compress.type",CompressionType.NONE.toString());
块|无|记录有三种压缩类型。