pyspark modulenotfounderror:没有名为“mmlspark”的模块

eulz3vhy  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(937)

我的环境:ubuntu64位,spark2.4.5,jupyter笔记本。
互联网连接很好,我没有任何错误:

spark = SparkSession.builder \
.appName("Churn Scoring LightGBM") \
.master("local[4]") \
.config("spark.jars.packages","com.microsoft.ml.spark:mmlspark_2.11:0.18.1") \
.getOrCreate()

from mmlspark.lightgbm import LightGBMClassifier

但是没有互联网连接,我得到了相关的jar(cloudera docs推荐的这种风格):

import os
mmlspark_jars_dir = os.path.join(os.environ["SPARK_HOME"], "mmlspark_jars")
mmlspark_jars = [os.path.join(mmlspark_jars_dir, x) for x in os.listdir(mmlspark_jars_dir)]
print(mmlspark_jars)
['/home/erkan/spark/mmlspark_jars/com.jcraft_jsch-0.1.54.jar',
 '/home/erkan/spark/mmlspark_jars/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar',
 '/home/erkan/spark/mmlspark_jars/commons-codec_commons-codec-1.10.jar',
 '/home/erkan/spark/mmlspark_jars/org.scalatest_scalatest_2.11-3.0.5.jar',
 '/home/erkan/spark/mmlspark_jars/org.apache.httpcomponents_httpcore-4.4.10.jar',
 '/home/erkan/spark/mmlspark_jars/org.openpnp_opencv-3.2.0-1.jar',
 '/home/erkan/spark/mmlspark_jars/commons-logging_commons-logging-1.2.jar',
 '/home/erkan/spark/mmlspark_jars/com.github.vowpalwabbit_vw-jni-8.7.0.2.jar',
 '/home/erkan/spark/mmlspark_jars/org.apache.httpcomponents_httpclient-4.5.6.jar',
 '/home/erkan/spark/mmlspark_jars/org.scala-lang_scala-reflect-2.11.12.jar',
 '/home/erkan/spark/mmlspark_jars/org.scala-lang.modules_scala-xml_2.11-1.0.6.jar',
 '/home/erkan/spark/mmlspark_jars/com.microsoft.cntk_cntk-2.4.jar',
 '/home/erkan/spark/mmlspark_jars/io.spray_spray-json_2.11-1.3.2.jar',
 '/home/erkan/spark/mmlspark_jars/org.scalactic_scalactic_2.11-3.0.5.jar',
 '/home/erkan/spark/mmlspark_jars/com.microsoft.ml.lightgbm_lightgbmlib-2.2.350.jar']

我不得不这样修改sparksession:

spark = SparkSession.builder \
.appName("Churn Scoring LightGBM") \
.master("local[4]") \
.config("spark.jars", ",".join(mmlspark_jars)) \
.getOrCreate()

我从终点站观察,一切似乎都很好,Spark产生了。然后我检查了spark ui

然后我尝试导入:

from mmlspark.lightgbm import LightGBMClassifier

出现了这个错误:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-10-df498625321c> in <module>
----> 1 from mmlspark.lightgbm import LightGBMClassifier

ModuleNotFoundError: No module named 'mmlspark'

我不明白,尽管我在sparkui import上看到了相同的jar,但第二种方法不起作用。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题