我从Cassandra阅读数据:
df = spark.read\
.format("org.apache.spark.sql.cassandra")\
.options(**configs)\
.options(table=tablename, keyspace=keyspace)\
.option("ssl", True)\
.option("sslmode", "require")\
.load()
字符串
这个df就是pyspark dataframe。我可以在这个df上执行show(),printSchema()函数,但是当我打印时
df.count()
型
抛出错误:
An error was encountered:
An error occurred while calling o1394.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage
48.0 failed 4 times, most recent failure: Lost task 19.3 in stage 48.0 (TID 2053, js-
56258-63801-i-32-w-1.net, executor 9): java.lang.IllegalArgumentException:
requirement failed: Column not found in Java driver Row: count
型
如何解决此问题?提前致谢
1条答案
按热度按时间au9on6nz1#
我假设它不会总是在同一个阶段失败。如果是这种情况,那么您可以尝试调整读/写参数:
https://github.com/datastax/spark-cassandra-connector/blob/b2.4/doc/reference.md#read-tuning-parameters
https://github.com/datastax/spark-cassandra-connector/blob/b2.4/doc/reference.md#write-tuning-parameters
启动pyspark时,需要传入
--conf spark.cassandra.<option>
。