classnotfoundexception:org.apache.spark.sparkconf with spark on hive

qgelzfjb  于 2021-05-29  发布在  Hadoop
关注(0)|答案(3)|浏览(395)

我试图使用spark作为配置单元执行引擎,但得到以下错误。spark 1.5.0已经安装,我正在使用Hive1.1.0版本和Hadoop2.7.0版本。 hive_emp 表在配置单元中创建为orc格式表。

hive (Koushik)> insert into table hive_emp values (2,'Koushik',1);
Query ID = hduser_20150921072727_feba8363-258d-4d0b-8976-662e404bca88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:140)
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:56)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
    at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
    at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    ... 25 more
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org/apache/spark/SparkConf

我还在hiveshell中设置了spark路径和执行引擎。

hduser@ubuntu:~$ spark-shell
    Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> exit;
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
hduser@ubuntu:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/lib/hive/conf/hive-log4j.properties
hive (default)> use Koushik;
OK
Time taken: 0.593 seconds
hive (Koushik)> set spark.home=/usr/local/src/spark;

我还创建了一个.hiverc,如下所示

hduser@ubuntu:/usr/lib/hive/conf$ cat .hiverc
SET hive.cli.print.header=true;
set hive.cli.print.current.db=true;
set hive.auto.convert.join=true;
SET hbase.scan.cacheblock=0;
SET hbase.scan.cache=10000;
SET hbase.client.scanner.cache=10000;
SET hive.execution.engine=spark;

调试模式错误详细信息如下:

hduser@ubuntu:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/lib/hive/conf/hive-log4j.properties
hive (default)> use Koushik;
OK
Time taken: 0.625 seconds
hive (Koushik)> set hive --hiveconf hive.root.logger=DEBUG
              > ;
hive (Koushik)> set hive.execution.engine=spark;
hive (Koushik)> desc hive_emp;
OK
col_name    data_type   comment
empid                   int                                         
empnm                   varchar(50)                                 
deptid                  int                                         
Time taken: 0.173 seconds, Fetched: 3 row(s)
hive (Koushik)> select * from hive_emp;
OK
hive_emp.empid  hive_emp.empnm  hive_emp.deptid
Time taken: 1.689 seconds
hive (Koushik)> insert into table hive_emp values (2,'Koushik',1);
Query ID = hduser_20151015112525_c96a458b-34f8-42ac-ab11-52c32479a29a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V
    at org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.<init>(LocalHiveSparkClient.java:85)
    at org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.getInstance(LocalHiveSparkClient.java:69)
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:56)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
    at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
    at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V
hive (Koushik)>

我已经执行了上述插入两次,两次都失败了。请查找今天生成的hive.log.hive.log

cigdeys3

cigdeys31#

此错误的原因是配置单元找不到spark程序集jar。
导出spark\u home=/usr/local/src/spark或在hive lib文件夹中添加spark程序集jar。这个问题将得到解决。

xfb7svmp

xfb7svmp2#

和你一样,我在spark上部署hive时也遇到了同样的问题。最后,经过我的研究,发现由于hive无法加载spark jar,所以我对hive-env.sh做了以下更改。
加载项hive-env.sh:
//注意你的星火轨迹

export SPARK_HOME=/opt/module/spark-2.4.5-bin-without-hive
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
    export SPARK_JARS=$SPARK_JARS:$SPARK_HOME/jars/$jar
done
export HIVE_AUX_JARS_PATH=$SPARK_JARS

中文: 这就是你的Hive启动时没加载到Spark的罐,所以在Hive环境.sh里配置一下环境就可以了。 注意这里面的路径,我最下面的lzo公司你也可以不配,可以参考上面的这个配置(只是少了(lzo)

export SPARK_HOME=/opt/module/spark-2.4.5-bin-without-hive
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
    export SPARK_JARS=$SPARK_JARS:$SPARK_HOME/jars/$jar
done
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar$SPARK_JARS
mqkwyuun

mqkwyuun3#

我也面临着同样的问题 my Ubuntu 14.4 VitualBox . 以下是我所遵循的修复步骤: hive> set spark.home=/usr/local/spark; hive> set spark.master=local; hive> SET hive.execution.engine=spark; 补充 spark-assembly jar 文件如下所示: hive> ADD jar /usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar;

相关问题