我试图通过spark代码执行hivesql,但它抛出了下面提到的错误。我只能从配置单元表中选择数据。
我的spark版本是1.6.1我的hive版本是1.2.1
运行spark submit的命令
spark submit--master local[8]--files/srv/data/app/spark/conf/hive-site.xml test\u hive.py
code:-
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
from pyspark.sql import HiveContext
sc=SparkContext()
sqlContext = SQLContext(sc)
HiveContext = HiveContext(sc)
#HiveContext.setConf("yarn.timeline-service.enabled","false")
#HiveContext.sql("SET spark.sql.crossJoin.enabled=false")
HiveContext.sql("use default")
HiveContext.sql("TRUNCATE TABLE default.test_table")
HiveContext.sql("LOAD DATA LOCAL INPATH '/srv/data/data_files/*' OVERWRITE INTO TABLE default.test_table")
df = HiveContext.sql("select * from version")
for x in df.collect():
print x
Error:-
17386 [Thread-3] ERROR org.apache.spark.sql.hive.client.ClientWrapper -
======================
HIVE FAILURE OUTPUT
======================
SET spark.sql.inMemoryColumnarStorage.compressed=true
SET spark.sql.thriftServer.incrementalCollect=true
SET spark.sql.hive.convertMetastoreParquet=false
SET spark.sql.broadcastTimeout=800
SET spark.sql.hive.thriftServer.singleSession=true
SET spark.sql.inMemoryColumnarStorage.partitionPruning=true
SET spark.sql.crossJoin.enabled=true
SET hive.support.sql11.reserved.keywords=false
SET spark.sql.crossJoin.enabled=false
OK
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. ClassCastException: attempting to castjar:file:/srv/data/OneClickProvision_1.2.2/files/app/spark/assembly/target/scala-2.10/spark-assembly-1.6.2-SNAPSHOT-hadoop2.6.1.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/srv/data/OneClickProvision_1.2.2/files/app/spark/assembly/target/scala-2.10/spark-assembly-1.6.2-SNAPSHOT-hadoop2.6.1.jar!/javax/ws/rs/ext/RuntimeDelegate.class
======================
END HIVE FAILURE OUTPUT
======================
Traceback (most recent call last):
File "/home/iip/hist_load.py", line 10, in <module>
HiveContext.sql("TRUNCATE TABLE default.tbl_wmt_pos_file_test")
File "/srv/data/OneClickProvision_1.2.2/files/app/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 580, in sql
File "/srv/data/OneClickProvision_1.2.2/files/app/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/srv/data/OneClickProvision_1.2.2/files/app/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
File "/srv/data/OneClickProvision_1.2.2/files/app/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o46.sql.
: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. ClassCastException: attempting to castjar:file:/srv/data/OneClickProvision_1.2.2/files/app/spark/assembly/target/scala-2.10/spark-assembly-1.6.2-SNAPSHOT-hadoop2.6.1.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/srv/data/OneClickProvision_1.2.2/files/app/spark/assembly/target/scala-2.10/spark-assembly-1.6.2-SNAPSHOT-hadoop2.6.1.jar!/javax/ws/rs/ext/RuntimeDelegate.class
2条答案
按热度按时间pxy2qtax1#
从这里的帖子:
spark作业失败
ClassCastException
因为同一个类的不同版本在YARN
以及SPARK
jar。从hivecontext中的以下属性集合:
vyswwuz22#
我只能从配置单元表中选择数据。
这是完全正常和预期的行为。sparksql并不打算与hiveql完全兼容或实现hive提供的全套功能。
总的来说,在sparksql收敛到sql2003标准时,保留了一些兼容性,但不能保证将来会保留这些兼容性。