配置单元结果另存为Parquet文件

cl25kdpy 于 2021-06-28 发布在 Hive

关注(0)|答案(1)|浏览(562)

我正在尝试从配置单元表创建snappy.parquet文件。它是一个大的分区表只需要它的一小部分。这样做：

set parquet.compression=SNAPPY;
set hive.exec.compress.output=true;
set hive.exec.compress.intermediate=true;
set hive.exec.parallel=true;
set mapred.output.compress=true;
set mapreduce.output.fileoutputformat.compress=true;
set mapred.compress.map.output=true;
set mapreduce.map.output.compress=true;
set mapred.output.compression.type=BLOCK;
set mapreduce.output.fileoutputformat.compress.type=BLOCK;
set io.seqfile.compression.type = BLOCK;
insert overwrite directory 'EXTERNAL_DIRECTORY' STORED AS PARQUET select * from SOURCE_TABLE;

它使用以下架构创建0000000文件：

message hive_schema {
optional int32 _col0;
optional binary _col1 (UTF8);
optional binary _col2 (UTF8);
optional binary _col3 (UTF8);
optional binary _col4 (UTF8);
optional binary _col5 (UTF8);
optional binary _col6 (UTF8);
optional binary _col7 (UTF8);
optional int64 _col8;
optional int64 _col9;
optional int64 _col10;
)

从源表中删除所有列名。如何正确保存它，以便以后可以将其用作配置单元表？

Hive parquet snappy

来源：https://stackoverflow.com/questions/39510247/hive-results-save-as-parquet-file

1条答案

按热度按时间

eh57zj3b1#

我将为您的数据集创建一个新的外部表，方法是从您要查找的源分区中选择所有数据。然后就有了一个可以利用的表和文件。现在不能使用外部表执行CREATETABLEASSELECT语句，因此需要先创建表，然后将数据加载到其中。

create external table yourNewTable ( use your source table DDL...)
  stored as parquet location '/yourNewLocation';

insert into yourNewTable
  select * from yourSourceTable where yourPartitionedFieldNames = 'desiredPartitionName';

赞(0）回复(0）举报 2021-06-28

我来回答

配置单元结果另存为Parquet文件

1条答案

相关问题

热门标签

最新问答