我有一个spark独立系统,我正在使用它提交一个现有cloudera cdh集群上的sparkr作业
Apache Spark Version
1.5.0, Hadoop 2.6
Cloudera Spark Version
1.5.0-cdh5.5.1, Hadoop 2.6.0-cdh5.5.1
代码:
library(SparkR, lib.loc = "/opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6/R/lib")
sc <- sparkR.init(master = "spark://10.103.25.39:7077", appName = "SparkR_demo_RTA", sparkHome = "/opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6", sparkEnvir = list(spark.executor.memory = '512m'))
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, faithful)
head(df)
sparkR.stop()
接下来我将提交这个sparkr testcluster.r文件,如下所示
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/etc/hadoop/conf.pseudo
export SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/etc/hadoop/conf.pseudo
./bin/spark-submit --master spark://10.103.25.39:7077 /opt/BIG-DATA/SparkR/sparkR-testcluster.R
但是,我得到了以下错误(如果我理解正确,这是因为版本不匹配)
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161214124958-0014/151 on hostPort 10.103.40.186:7078 with 4 cores, 512.0 MB RAM
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/151 is now RUNNING
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/151 is now LOADING
16/12/14 12:51:26 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 10.103.40.207): java.io.InvalidClassException: org.apache.spark.sql.types.StructType; local class incompatible: stream classdesc serialVersionUID = -2623502157469710728, local class serialVersionUID = 1299744747852393705
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/12/14 12:51:26 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, 10.103.25.39, PROCESS_LOCAL, 13045 bytes)
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/151 is now EXITED (Command exited with code 1)
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161214124958-0014/151 removed: Command exited with code 1
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 151
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20161214124958-0014/152 on worker-20161208195437-10.103.40.186-7078 (10.103.40.186:7078) with 4 cores
....................
....................
16/12/14 12:51:26 ERROR r.RBackendHandler: dfToCols on org.apache.spark.sql.api.r.SQLUtils failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 10.103.40.207): java.io.InvalidClassException: org.apache.spark.sql.types.StructType; local class incompatible: stream classdesc serialVersionUID = -2623502157469710728, local class serialVersionUID = 1299744747852393705
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.
Calls: head ... collect -> collect -> .local -> callJStatic -> invokeJava
Execution halted
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/154 is now LOADING
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/154 is now EXITED (Command exited with code 1)
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161214124958-0014/154 removed: Command exited with code 1
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 154
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20161214124958-0014/155 on worker-20161208195437-10.103.40.186-7078 (10.103.40.186:7078) with 4 cores
我不明白我错在哪里
有什么帮助吗?
编辑:
我想知道我应该添加哪个jar来运行我的工作,这样我就不会面临这个问题。我知道这是不匹配的,因为它选择了错误的类。我怎样才能选择正确的课程
暂无答案!
目前还没有任何答案,快来回答吧!