hiveserver2(在spark上)-“mapred.fileinputformat:要处理的输入文件总数”-为什么是单线程的?

h5qlskok  于 2021-05-27  发布在  Hadoop
关注(0)|答案(0)|浏览(217)

我在spark(exec引擎)上安装了hive(v2.3.4)。我的外部配置单元表是s3上的Parquet格式,跨越100个分区。以下设置设置为20:

hive.exec.input.listing.max.threads
    mapred.dfsclient.parallelism.max
    mapreduce.input.fileinputformat.list-status.num-threads

运行简单查询:

select * from s.there h_code = 'KGD78' and h_no = '265'

我可以在hiveserver2日志中看到下面的内容(日志持续超过1000行,列出了所有不同的分区)。为什么没有并行地列出文件?仅仅在列表中就需要5分钟以上。

2019-03-29T11:29:26,866  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] compress.CodecPool: Got brand-new decompressor [.snappy]
2019-03-29T11:29:27,283  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:27,797  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:28,374  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:28,919  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:29,483  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:30,003  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:30,518  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:31,001  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:31,549  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:32,048  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:32,574  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:33,130  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:33,639  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:34,189  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:34,743  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:35,208  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:35,701  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:36,183  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:36,662  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:37,154  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:37,645  INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1

我试过了

hive.exec.input.listing.max.threads
    mapred.dfsclient.parallelism.max
    mapreduce.input.fileinputformat.list-status.num-threads

默认值为1,50…结果仍然相同

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题