/**
* Supported compression algorithms.
*
* Codecs added in 2.4 can be read by readers based on 2.4 and later.
* Codec support may vary between readers based on the format version and
* libraries available at runtime. Gzip, Snappy, and LZ4 codecs are
* widely available, while Zstd and Brotli require additional libraries.
*/
enum CompressionCodec {
UNCOMPRESSED = 0;
SNAPPY = 1;
GZIP = 2;
LZO = 3;
BROTLI = 4; // Added in 2.4
LZ4 = 5; // Added in 2.4
ZSTD = 6; // Added in 2.4
}
2条答案
按热度按时间aydmsdu91#
apache parquet支持的压缩类型在
parquet-format
存储库:https://github.com/apache/parquet-format/blob/54e6133e887a6ea90501ddd72fff5312b7038a7c/src/main/thrift/parquet.thrift#l461
snappy和gzip是最常用的,所有实现都支持它们。lz4和zstd产生了比前两个更好的结果,但是它们是对格式的一个新的补充,所以它们只在一些实现的较新版本中受支持。
w8rqjzmb2#
在spark 2.1中
来自spark源代码,分支2.1:
您可以设置以下特定于Parquet地板的选项来写入Parquet地板文件:
compression
(默认值为中指定的值spark.sql.parquet.compression.codec
):保存到文件时要使用的压缩编解码器。这可以是已知的不区分大小写的缩写名称之一(none
,snappy
,gzip
,和lzo
).这将覆盖
spark.sql.parquet.compression.codec
...在spark 2.4/3.0中
总体支持的压缩为:
none
,uncompressed
,snappy
,gzip
,lzo
,brotli
,lz4
,和zstd