sparkml模型保存到hdfs

yzuktlbb  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(472)

我正在尝试将我的模型保存为从sparkml库创建的对象。
但是,它给了我一个错误:
线程“main”java.lang.nosuchmethoderror中出现异常:org.apache.spark.ml.pipelinemodel.save(ljava/lang/string;)v在com.sf.prediction$.main(预测。scala:61)位于com.sf.prediction.main(prediction.scala)的sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)的sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl)。java:57)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:606)在org.apache.spark.deploy.sparksubmit$.org$apache$spark$deploy$sparksubmit$$runmain(sparksubmit)。scala:672)在org.apache.spark.deploy.sparksubmit$.dorunmain$1(sparksubmit。scala:180)在org.apache.spark.deploy.sparksubmit$.submit(sparksubmit。scala:205)在org.apache.spark.deploy.sparksubmit$.main(sparksubmit。scala:120)位于org.apache.spark.deploy.sparksubmit.main(sparksubmit.scala)
以下是我的依赖项:

<dependency>
        <groupId>org.scalatest</groupId>
        <artifactId>scalatest_2.10</artifactId>
        <version>2.1.7</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.3</version>
        <type>maven-plugin</type>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-parser-combinators</artifactId>
        <version>2.11.0-M4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.6.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-csv</artifactId>
        <version>1.2</version>
    </dependency>

    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.10</artifactId>
        <version>1.4.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.10</artifactId>
        <version>1.6.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.10</artifactId>
        <version>1.6.0</version>
    </dependency>

我还想将模型生成的Dataframe保存为csv。

model.transform(df).select("features","label","prediction").show()

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._

import org.apache.spark.SparkConf

import org.apache.spark.sql.hive.HiveContext

import org.apache.spark.ml.feature.OneHotEncoder
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.PipelineModel._
import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer}
import org.apache.spark.ml.util.MLWritable

object prediction {
  def main(args: Array[String]): Unit = {

     val conf = new SparkConf()
             .setMaster("local[2]")
             .setAppName("conversion")
    val sc = new SparkContext(conf)

    val hiveContext = new HiveContext(sc)

    val df = hiveContext.sql("select * from prediction_test")
    df.show()
    val credit_indexer = new StringIndexer().setInputCol("transaction_credit_card").setOutputCol("creditCardIndex").fit(df)
    val category_indexer = new StringIndexer().setInputCol("transaction_category").setOutputCol("categoryIndex").fit(df)
    val location_flag_indexer = new StringIndexer().setInputCol("location_flag").setOutputCol("locationIndex").fit(df)
    val label_indexer = new StringIndexer().setInputCol("fraud").setOutputCol("label").fit(df)

    val assembler =  new VectorAssembler().setInputCols(Array("transaction_amount", "creditCardIndex","categoryIndex","locationIndex")).setOutputCol("features")
    val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01)
    val pipeline = new Pipeline().setStages(Array(credit_indexer, category_indexer, location_flag_indexer, label_indexer, assembler, lr))

    val model = pipeline.fit(df)

    pipeline.save("/user/f42h/prediction/pipeline")
    model.save("/user/f42h/prediction/model")
 //   val sameModel = PipelineModel.load("/user/bob/prediction/model")
    model.transform(df).select("features","label","prediction")

  }
}
cigdeys3

cigdeys31#

您使用的是spark 1.6.0和afaik,ml模型的保存/加载仅从2.0开始提供。您可以使用带有 2.0.0-preview 版本:http://search.maven.org/#search%7cga%7c1%7cg%3aorg.apache.spark%20v%3a2.0.0-预览

相关问题