set hive.merge.mapfiles=true; -- Merge small files at the end of a map-only job.
set hive.merge.mapredfiles=true; -- Merge small files at the end of a map-reduce job.
set hive.merge.size.per.task=???; -- Size (bytes) of merged files at the end of the job.
set hive.merge.smallfiles.avgsize=??? -- File size (bytes) threshold
-- When the average output file size of a job is less than this number,
-- Hive will start an additional map-reduce job to merge the output files
-- into bigger files. This is only done for map-only jobs if hive.merge.mapfiles
-- is true, and for map-reduce jobs if hive.merge.mapredfiles is true.
2条答案
按热度按时间kqhtkvqz1#
看到了吗https://community.cloudera.com/t5/support-questions/hive-multiple-small-files/td-p/204038
sigwle7e2#
限制输出文件的数量意味着您要限制减速器的数量。你可以借助
mapred.reduce.tasks
属性。例子:但它可能会影响查询的性能。或者,你可以使用
getmerge
完成查询后,从hdfsshell发出命令。此命令将源目录和目标文件作为输入,并将src中的文件连接到目标本地文件中。用法:
hth公司