你需要一个 CompressionCodec 解压文件。gzip的实现是 GzipCodec . 你得到一个 CompressedInputStream 通过编解码器和简单的io输出结果。比如说:假设你有一个文件 file.gz ``` //path of file String uri = "/uri/to/file.gz"; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(uri), conf); Path inputPath = new Path(uri);
CompressionCodecFactory factory = new CompressionCodecFactory(conf); // the correct codec will be discovered by the extension of the file CompressionCodec codec = factory.getCodec(inputPath);
if (codec == null) { System.err.println("No codec found for " + uri); System.exit(1); }
// remove the .gz extension String outputUri = CompressionCodecFactory.removeSuffix(uri, codec.getDefaultExtension());
InputStream is = codec.createInputStream(fs.open(inputPath)); OutputStream out = fs.create(new Path(outputUri)); IOUtils.copyBytes(is, out, conf);
2条答案
按热度按时间igsr9ssn1#
你需要一个
CompressionCodec
解压文件。gzip的实现是GzipCodec
. 你得到一个CompressedInputStream
通过编解码器和简单的io输出结果。比如说:假设你有一个文件file.gz
```//path of file
String uri = "/uri/to/file.gz";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
Path inputPath = new Path(uri);
CompressionCodecFactory factory = new CompressionCodecFactory(conf);
// the correct codec will be discovered by the extension of the file
CompressionCodec codec = factory.getCodec(inputPath);
if (codec == null) {
System.err.println("No codec found for " + uri);
System.exit(1);
}
// remove the .gz extension
String outputUri =
CompressionCodecFactory.removeSuffix(uri, codec.getDefaultExtension());
InputStream is = codec.createInputStream(fs.open(inputPath));
OutputStream out = fs.create(new Path(outputUri));
IOUtils.copyBytes(is, out, conf);
// close streams
FileSystem fs = FileSystem.get(new Configuration());
FileStatus[] statuses = fs.listStatus(new Path("hdfs/path/to/dir"));
for (FileStatus status: statuses) {
CompressionCodec codec = factory.getCodec(status.getPath());
...
InputStream is = codec.createInputStream(fs.open(status.getPath());
...
}
r8xiu3jd2#
我使用一个身份Maphadoop作业来改变压缩/分割大小等。
常规配置抽象类: