我有一个奇怪的错误,已经开始发生在几个星期前。我们不得不更换几个分析节点,hive调用的hadoop作业都无法完成。它们在不同的阶段崩溃,出现了类似的错误:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ip-x-x-x-x.ec2.internal/x.x.x.x:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:259)
at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.isExhausted(ArrayBackedResultSet.java:222)
at com.datastax.driver.core.ArrayBackedResultSet$1.hasNext(ArrayBackedResultSet.java:115)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.computeNext(CqlRecordReader.java:239)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.computeNext(CqlRecordReader.java:218)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader.getProgress(CqlRecordReader.java:152)
at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.getProgress(CqlHiveRecordReader.java:62)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.getProgress(HiveRecordReader.java:71)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.getProgress(MapTask.java:260)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:233)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ip-x-x-x-x.ec2.internal/x.x.x.x:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
我打开了调试日志,但仍然找不到当时发生的任何事情。
谢谢!
1条答案
按热度按时间tmb3ates1#
实际上,问题在于应用程序将大量数据写入其中一个map列。它恰好与集群更新同时发生。Hive作业刚刚挂起,出现了一条误导性的错误消息。通过一些尝试和错误,我能够将问题缩小到Map栏,并删除有问题的数据。