我的环境:ubuntu64位,spark2.4.5,jupyter笔记本。
互联网连接很好,我没有任何错误:
spark = SparkSession.builder \
.appName("Churn Scoring LightGBM") \
.master("local[4]") \
.config("spark.jars.packages","com.microsoft.ml.spark:mmlspark_2.11:0.18.1") \
.getOrCreate()
from mmlspark.lightgbm import LightGBMClassifier
但是没有互联网连接,我得到了相关的jar(cloudera docs推荐的这种风格):
import os
mmlspark_jars_dir = os.path.join(os.environ["SPARK_HOME"], "mmlspark_jars")
mmlspark_jars = [os.path.join(mmlspark_jars_dir, x) for x in os.listdir(mmlspark_jars_dir)]
print(mmlspark_jars)
['/home/erkan/spark/mmlspark_jars/com.jcraft_jsch-0.1.54.jar',
'/home/erkan/spark/mmlspark_jars/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar',
'/home/erkan/spark/mmlspark_jars/commons-codec_commons-codec-1.10.jar',
'/home/erkan/spark/mmlspark_jars/org.scalatest_scalatest_2.11-3.0.5.jar',
'/home/erkan/spark/mmlspark_jars/org.apache.httpcomponents_httpcore-4.4.10.jar',
'/home/erkan/spark/mmlspark_jars/org.openpnp_opencv-3.2.0-1.jar',
'/home/erkan/spark/mmlspark_jars/commons-logging_commons-logging-1.2.jar',
'/home/erkan/spark/mmlspark_jars/com.github.vowpalwabbit_vw-jni-8.7.0.2.jar',
'/home/erkan/spark/mmlspark_jars/org.apache.httpcomponents_httpclient-4.5.6.jar',
'/home/erkan/spark/mmlspark_jars/org.scala-lang_scala-reflect-2.11.12.jar',
'/home/erkan/spark/mmlspark_jars/org.scala-lang.modules_scala-xml_2.11-1.0.6.jar',
'/home/erkan/spark/mmlspark_jars/com.microsoft.cntk_cntk-2.4.jar',
'/home/erkan/spark/mmlspark_jars/io.spray_spray-json_2.11-1.3.2.jar',
'/home/erkan/spark/mmlspark_jars/org.scalactic_scalactic_2.11-3.0.5.jar',
'/home/erkan/spark/mmlspark_jars/com.microsoft.ml.lightgbm_lightgbmlib-2.2.350.jar']
我不得不这样修改sparksession:
spark = SparkSession.builder \
.appName("Churn Scoring LightGBM") \
.master("local[4]") \
.config("spark.jars", ",".join(mmlspark_jars)) \
.getOrCreate()
我从终点站观察,一切似乎都很好,Spark产生了。然后我检查了spark ui
然后我尝试导入:
from mmlspark.lightgbm import LightGBMClassifier
出现了这个错误:
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-10-df498625321c> in <module>
----> 1 from mmlspark.lightgbm import LightGBMClassifier
ModuleNotFoundError: No module named 'mmlspark'
我不明白,尽管我在sparkui import上看到了相同的jar,但第二种方法不起作用。
暂无答案!
目前还没有任何答案,快来回答吧!