加载mlpyspark模型失败

wtzytmuj  于 2021-07-09  发布在  Spark
关注(0)|答案(1)|浏览(392)

我有几个我无法加载的回归模型。这是spark init:

from pyspark.sql import SparkSession, SQLContext
from pyspark.ml.regression import DecisionTreeRegressor

spark = SparkSession.builder \
    .appName("Linear Regression Model") \
    .config('spark.executor.cores','2') \
    .config("spark.executor.memory", "5gb") \
    .master("local[*]") \
    .getOrCreate() 

sc = spark.sparkContext

以下是模型拟合并成功保存:


# Decision Tree Regression

decisionTree = DecisionTreeRegressor(featuresCol = "Features", labelCol = "SalePrice", maxDepth = 15, maxBins = 32)
decisionTreeModel = decisionTree.fit(train_vector)

import os

decisionTreeModel.save(os.path.join(".", 'decisionTreeModel'))

但当我把它装回去的时候:

persistedModel = DecisionTreeRegressor.load("decisionTreeModel")

我得到这个错误:

Py4JJavaError: An error occurred while calling o1201.load.
: java.lang.NoSuchMethodException: org.apache.spark.ml.regression.DecisionTreeRegressionModel.<init>(java.lang.String)
    at java.lang.Class.getConstructor0(Class.java:3082)
    at java.lang.Class.getConstructor(Class.java:1825)
    at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:468)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

有人知道如何加载pyspark模型吗?

plupiseo

plupiseo1#

错误消息不是很有帮助,但是我认为正确的方法是调用 load 模型的方法,而不是估计量的方法。该模型已对数据进行了拟合,这不同于只包含设定值/参数而不拟合的估计器。
所以你可以试试这个:

from pyspark.ml.regression import DecisionTreeRegressionModel

persistedModel = DecisionTreeRegressionModel.load("decisionTreeModel")

以下是加载估计器与加载模型的比较,供您参考:

from pyspark.ml.regression import DecisionTreeRegressor, DecisionTreeRegressionModel

decisionTree = DecisionTreeRegressor(featuresCol = "Features", labelCol = "SalePrice", maxDepth = 15, maxBins = 32)
decisionTree.save('tree')
persistedEstimator = DecisionTreeRegressor.load('tree')

decisionTreeModel = decisionTree.fit(df)
decisionTreeModel.save('model')
persistedModel = DecisionTreeRegressionModel.load('model')

相关问题