如何修复spark cassandra连接器中的读取超时异常

kqhtkvqz 于 2021-06-14 发布在 Cassandra

关注(0)|答案(0)|浏览(354)

我在azure databricks平台、dse 6.0.7和spark cassandra连接器版本2.4.0中使用spark 2.4和scala 2.11
我得到下面的错误，而得到我的一个表，其中约有1亿条记录计数。其中一个应用程序需要精确的行计数。下面是我的密码-

val count = spark.read
  .format("org.apache.spark.sql.cassandra")
  .option("table", tableName)
  .option("keyspace", keyspace)
  .load()
  .count()

以下是例外-

java.io.IOException: Exception during execution of SELECT count(*) FROM "mykeyspace"."mytable" WHERE token("id") > ? AND token("id") <= ?   ALLOW FILTERING: [/host:9042] Timed out waiting for server response
  at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:350)
  at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:367)
  at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:367)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at com.datastax.spark.connector.util.CountingIterator.hasNext(CountingIterator.scala:12)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown Source)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:634)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
  at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
  at org.apache.spark.scheduler.Task.run(Task.scala:112)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1432)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [/host:9042] Timed out waiting for server response

cassandra apache-spark azure-databricks

来源：https://stackoverflow.com/questions/56378040/how-to-fix-read-timeout-exception-in-spark-cassandra-connector

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

如何修复spark cassandra连接器中的读取超时异常

暂无答案！

相关问题

热门标签

最新问答