slow-hive查询性能

xqnpmsa8  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(343)

我遇到了一个奇怪的问题,我向你保证我在谷歌上搜索了很多。
我正在运行一组aws弹性mapreduce集群,并且我有一个包含大约16个分区的配置单元表。它们是从emr-s3distcp创建的(因为在原始的s3 bucket中大约有216k个文件),将--groupby和with limit设置为64mib(在本例中是dfs块大小),它们只是文本文件,每行都有一个json对象,使用json serde。
当我运行这个脚本时,它需要很长时间,然后由于一些ipc连接而放弃。
最初,从s3distcp到hdfs的压力非常大,因此我采取了一些措施(如:调整大小到更高容量的机器,然后将dfs权限设置为3倍复制,因为它是一个小集群,并将块大小设置为64mib)。这起了作用,复制不足的块数变成了零(emr中小于3的默认值是2,但我改为3)。
查看/mnt/var/log/apps/hive\u 081.log会产生如下几行代码:

2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(222)) - The ping interval is60000ms.
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(265)) - Use SIMPLE authentication for protocol ClientProtocol
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:setupIOstreams(551)) - Connecting to /10.17.17.243:9000
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:sendParam(769)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop sending #14
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(742)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: starting, having connections 2
2013-05-12 09:56:12,125 DEBUG org.apache.hadoop.ipc.Client (Client.java:receiveResponse(804)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop got value #14
2013-05-12 09:56:12,126 DEBUG org.apache.hadoop.ipc.RPC (RPC.java:invoke(228)) - Call: getFileInfo 6
2013-05-12 09:56:21,523 INFO  org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 6 time(s).
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:close(876)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: closed
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(752)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: stopped, remaining connections 1
2013-05-12 09:56:42,544 INFO  org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 7 time(s).

以此类推,直到其中一个客户达到极限。
在elastic mapreduce下的hive中修复此问题需要什么?
谢谢

bvn4nwqk

bvn4nwqk1#

过了一会儿,我注意到:有问题的ip地址甚至不在我的群集中,所以它是一个卡住的Hive元存储。我已经解决了这个问题:

CREATE TABLE whatever_2 LIKE whatever LOCATION <hdfs_location>;

ALTER TABLE whetever_2 RECOVER PARTITIONS;

希望有帮助。

相关问题