读取期间dse hadoop间歇性超时错误

kqlmhetl  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(281)

我有一个奇怪的错误,已经开始发生在几个星期前。我们不得不更换几个分析节点,hive调用的hadoop作业都无法完成。它们在不同的阶段崩溃,出现了类似的错误:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ip-x-x-x-x.ec2.internal/x.x.x.x:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
    at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
    at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:259)
    at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.isExhausted(ArrayBackedResultSet.java:222)
    at com.datastax.driver.core.ArrayBackedResultSet$1.hasNext(ArrayBackedResultSet.java:115)
    at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.computeNext(CqlRecordReader.java:239)
    at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.computeNext(CqlRecordReader.java:218)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at org.apache.cassandra.hadoop.cql3.CqlRecordReader.getProgress(CqlRecordReader.java:152)
    at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.getProgress(CqlHiveRecordReader.java:62)
    at org.apache.hadoop.hive.ql.io.HiveRecordReader.getProgress(HiveRecordReader.java:71)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.getProgress(MapTask.java:260)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:233)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ip-x-x-x-x.ec2.internal/x.x.x.x:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
    at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
    at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

我打开了调试日志,但仍然找不到当时发生的任何事情。
谢谢!

tmb3ates

tmb3ates1#

实际上,问题在于应用程序将大量数据写入其中一个map列。它恰好与集群更新同时发生。Hive作业刚刚挂起,出现了一条误导性的错误消息。通过一些尝试和错误,我能够将问题缩小到Map栏,并删除有问题的数据。

相关问题