将spark设置为配置单元的默认执行引擎

vxbzzdmp  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(373)

hadoop 2.7.3、spark 2.1.0和hive 2.1.1。
我正在尝试将spark设置为配置单元的默认执行引擎。我将$spark\u home/jars中的所有jar上传到hdfs文件夹,并将scala库、spark core和spark网络公共jar复制到hive\u home/lib。然后,我用以下属性配置了hive-site.xml:

<property>
    <name>hive.execution.engine</name>
    <value>spark</value>
  </property>
  <property>
    <name>spark.master</name>
    <value>spark://master:7077</value>
    <description>Spark Master URL</description>
  </property>
  <property>
    <name>spark.eventLog.enabled</name>
    <value>true</value>
    <description>Spark Event Log</description>
  </property>
  <property>
    <name>spark.eventLog.dir</name>
    <value>hdfs://master:8020/user/spark/eventLogging</value>
    <description>Spark event log folder</description>
  </property>
  <property>
    <name>spark.executor.memory</name>
    <value>512m</value>
    <description>Spark executor memory</description>
  </property>
  <property>
    <name>spark.serializer</name>
    <value>org.apache.spark.serializer.KryoSerializer</value>
    <description>Spark serializer</description>
  </property>
  <property>
  <name>spark.yarn.jars</name>
  <value>hdfs://master:8020/user/spark/spark-jars/*</value>
</property>

在hive shell中,我执行了以下操作:

hive> add jar ${env:HIVE_HOME}/lib/scala-library-2.11.8.jar;
Added [/usr/local/hive/hive-2.1.1/lib/scala-library-2.11.8.jar] to class path
Added resources: [/usr/local/hive/hive-2.1.1/lib/scala-library-2.11.8.jar]
hive> add jar ${env:HIVE_HOME}/lib/spark-core_2.11-2.1.0.jar;
Added [/usr/local/hive/hive-2.1.1/lib/spark-core_2.11-2.1.0.jar] to class path
Added resources: [/usr/local/hive/hive-2.1.1/lib/spark-core_2.11-2.1.0.jar]
hive> add jar ${env:HIVE_HOME}/lib/spark-network-common_2.11-2.1.0.jar;
Added [/usr/local/hive/hive-2.1.1/lib/spark-network-common_2.11-2.1.0.jar] to class path
Added resources: [/usr/local/hive/hive-2.1.1/lib/spark-network-common_2.11-2.1.0.jar]
hive> set hive.execution.engine=spark;

当我试图执行
配置单元>从表名中选择计数(*);
我得到以下信息:

Query ID = hduser_20170130230014_6e23dacc-78e8-4bd6-9fad-1344f6d0569e
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

Hive日志显示 java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener ```
ERROR [main] client.SparkClientImpl: Error while waiting for client to connect.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client 'cc10915b-da97-4fd7-9960-49c03ea380d7'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.JavaSparkListener
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more

at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:106)
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:99)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:95)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:69)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136)
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:89)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by: java.lang.RuntimeException: Cancel client 'cc10915b-da97-4fd7-9960-49c03ea380d7'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.JavaSparkListener
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more

请帮助我在spark 2.1.0上集成hive 2.1.1。
x0fgdtte

x0fgdtte1#

这是spark中的一个bug,org.apache.spark.javasparklistener类已从spark 2.0.0中删除。它已被修复,并在审查过程中。如果修复将获得批准,那么它将在下一个spark中可用(可以是spark 2.2.0)
https://issues.apache.org/jira/browse/spark-17563

相关问题