无法通过spark运行配置单元sql

jljoyd4f  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(428)

我试图通过spark代码执行hivesql,但它抛出了下面提到的错误。我只能从配置单元表中选择数据。
我的spark版本是1.6.1我的hive版本是1.2.1
运行spark submit的命令
spark submit--master local[8]--files/srv/data/app/spark/conf/hive-site.xml test\u hive.py
code:-

from pyspark import SparkContext, SparkConf
    from pyspark.sql import SQLContext
    from pyspark.sql import HiveContext
    sc=SparkContext()
    sqlContext = SQLContext(sc)
    HiveContext = HiveContext(sc)
    #HiveContext.setConf("yarn.timeline-service.enabled","false")
    #HiveContext.sql("SET spark.sql.crossJoin.enabled=false")
    HiveContext.sql("use default")
    HiveContext.sql("TRUNCATE TABLE default.test_table")
    HiveContext.sql("LOAD DATA LOCAL INPATH '/srv/data/data_files/*' OVERWRITE INTO TABLE default.test_table")
    df = HiveContext.sql("select * from version")

    for x in df.collect():
            print x

Error:-

17386 [Thread-3] ERROR org.apache.spark.sql.hive.client.ClientWrapper  -
======================
HIVE FAILURE OUTPUT
======================
SET spark.sql.inMemoryColumnarStorage.compressed=true
SET spark.sql.thriftServer.incrementalCollect=true
SET spark.sql.hive.convertMetastoreParquet=false
SET spark.sql.broadcastTimeout=800
SET spark.sql.hive.thriftServer.singleSession=true
SET spark.sql.inMemoryColumnarStorage.partitionPruning=true
SET spark.sql.crossJoin.enabled=true
SET hive.support.sql11.reserved.keywords=false
SET spark.sql.crossJoin.enabled=false
OK
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. ClassCastException: attempting to castjar:file:/srv/data/OneClickProvision_1.2.2/files/app/spark/assembly/target/scala-2.10/spark-assembly-1.6.2-SNAPSHOT-hadoop2.6.1.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/srv/data/OneClickProvision_1.2.2/files/app/spark/assembly/target/scala-2.10/spark-assembly-1.6.2-SNAPSHOT-hadoop2.6.1.jar!/javax/ws/rs/ext/RuntimeDelegate.class

======================
END HIVE FAILURE OUTPUT
======================

Traceback (most recent call last):
  File "/home/iip/hist_load.py", line 10, in <module>
    HiveContext.sql("TRUNCATE TABLE default.tbl_wmt_pos_file_test")

 File "/srv/data/OneClickProvision_1.2.2/files/app/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 580, in sql
  File "/srv/data/OneClickProvision_1.2.2/files/app/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/srv/data/OneClickProvision_1.2.2/files/app/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
  File "/srv/data/OneClickProvision_1.2.2/files/app/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o46.sql.
: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. ClassCastException: attempting to castjar:file:/srv/data/OneClickProvision_1.2.2/files/app/spark/assembly/target/scala-2.10/spark-assembly-1.6.2-SNAPSHOT-hadoop2.6.1.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/srv/data/OneClickProvision_1.2.2/files/app/spark/assembly/target/scala-2.10/spark-assembly-1.6.2-SNAPSHOT-hadoop2.6.1.jar!/javax/ws/rs/ext/RuntimeDelegate.class
pxy2qtax

pxy2qtax1#

从这里的帖子:
spark作业失败 ClassCastException 因为同一个类的不同版本在 YARN 以及 SPARK jar。
从hivecontext中的以下属性集合:

hc = new org.apache.spark.sql.hive.HiveContext(sc)
hc.setConf("yarn.timeline-service.enabled","false")
vyswwuz2

vyswwuz22#

我只能从配置单元表中选择数据。
这是完全正常和预期的行为。sparksql并不打算与hiveql完全兼容或实现hive提供的全套功能。
总的来说,在sparksql收敛到sql2003标准时,保留了一些兼容性,但不能保证将来会保留这些兼容性。

相关问题