使用一个文件在配置单元中创建表

2admgd59 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(448)

我正在配置单元中创建一个新表，使用：

CREATE TABLE new_table AS select * from old_table;

我的问题是，在创建表之后，它会为每个分区生成多个文件，而我只希望每个分区有一个文件。
如何在表中定义它？谢谢您！

Hive create-table hive-partitions hiveddl

来源：https://stackoverflow.com/questions/45265339/create-table-in-hive-with-one-file

1条答案

按热度按时间

k5hmc34c1#

有许多可能的解决方案：
1）添加 distribute by partition key 在查询结束时。可能每个reducer有许多分区，每个reducer为每个分区创建文件。这也可以减少文件数量和内存消耗。 hive.exec.reducers.bytes.per.reducer 设置将定义每个reducer将处理多少数据。
2）简单，如果没有太多的数据就很好：添加 order by 强制单减速器。或增加 hive.exec.reducers.bytes.per.reducer=500000000; --5亿个文件。这是针对单减速机的解决方案，是针对数据不太多的情况，如果有很多数据，它会运行得很慢。
如果您的任务是map only，那么最好考虑选项3-5：
3）如果在mapreduce上运行，请打开“合并”：

set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.merge.size.per.task=500000000;  --Size of merged files at the end of the job
set hive.merge.smallfiles.avgsize=500000000; --When the average output file size of a job is less than this number, 
--Hive will start an additional map-reduce job to merge the output files into bigger files

4）在tez上运行时

set hive.merge.tezfiles=true; 
set hive.merge.size.per.task=500000000;
set hive.merge.smallfiles.avgsize=500000000;

5）对于orc文件，可以使用以下命令高效地合并文件： ALTER TABLE T [PARTITION partition_spec] CONCATENATE; -为了兽人

赞(0）回复(0）举报 2021-06-26

我来回答

使用一个文件在配置单元中创建表

1条答案

相关问题

热门标签

最新问答