我正在尝试在windows中安装apachespline。我的spark版本是2.4.0 scala版本是2.12.0我遵循这里提到的步骤https://absaoss.github.io/spline/ 我运行了docker compose命令,ui启动了
wget https://raw.githubusercontent.com/AbsaOSS/spline/release/0.5/docker-compose.yml
docker-compose up
之后,我尝试运行下面的命令来启动pyspark shell
pyspark \
--packages za.co.absa.spline.agent.spark:spark-2.4-spline-agent-bundle_2.12:0.5.3 \
--conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" \
--conf "spark.spline.producer.url=http://localhost:9090/producer"
这给了我以下的错误
C:\Users\AyanBiswas\Documents\softwares\spark-2.4.0-bin-hadoop2.7\python\pyspark\shell.py:45: UserWarning: Failed to initialize Spark session.
warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
File "C:\Users\AyanBiswas\Documents\softwares\spark-2.4.0-bin-hadoop2.7\python\pyspark\shell.py", line 41, in <module>
spark = SparkSession._create_shell_session()
File "C:\Users\AyanBiswas\Documents\softwares\spark-2.4.0-bin-hadoop2.7\python\pyspark\sql\session.py", line 583, in _create_shell_session
return SparkSession.builder.getOrCreate()
File "C:\Users\AyanBiswas\Documents\softwares\spark-2.4.0-bin-hadoop2.7\python\pyspark\sql\session.py", line 183, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "C:\Users\AyanBiswas\Documents\softwares\spark-2.4.0-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "C:\Users\AyanBiswas\Documents\softwares\spark-2.4.0-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco
return f(*a,**kw)
File "C:\Users\AyanBiswas\Documents\softwares\spark-2.4.0-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o31.sessionState.
: java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
我试图检查这个错误的原因,大多数帖子都指向scala版本不匹配,但是我使用的是scala2.12.0,上面提到的spline包也是针对scala2.12的。那么,我错过了什么?
2条答案
按热度按时间xmakbtuz1#
我会尝试更新您的scala和spark版本,使其永远不会是次要版本。样条曲线内部使用spark 2.4.2和scala 2.12.10。所以我会去的。但我不确定这是否是问题的原因。
pftdvrlh2#
我使用Spark2.4.2和Scala2.12.10解决了这个错误。原因是
所有spark 2.x版本都是使用scala 2.11构建的
只有spark 2.4.2是使用scala 2.12构建的
这是提到了Spark下载页面在这里
请注意,spark 2.x是用scala 2.11预构建的,而版本2.4.2是用scala 2.12预构建的。spark 3.0+是用scala 2.12预先构建的。