大家好,我正在编写一个servlet来将pyspark保存的管道转换为序列化的mleap模型。通过这种方式,我可以在生产环境中运行序列化模型,而不需要spark依赖关系。
这是我的密码:
// input data
log.error("LOAD CSV");
Dataset<Row> dataset = this.spark.read().format("csv").schema(getSchema()).option("header", "true")
.option("inferSchema", "true").load("/usr/local/tomcat/csv_data/data.csv");
dataset.show(1, false);
log.error("CSV print schema");
dataset.printSchema();
log.error("LOAD PIPELINE");
PipelineModel pipeline = PipelineModel.load("/usr/local/tomcat/models/data_transformation_pipeline");
Dataset<Row> transformedData = pipeline.transform(dataset);
log.error("LOAD MODEL");
PipelineModel model = PipelineModel.load("/usr/local/tomcat/models/regression_model");
Dataset<Row> prediction = model.transform(transformedData);
Dataset<Row> result = prediction.select(new Column("kpi_specific").alias("expected"), new Column("prediction"));
result.show(1, false);
MleapContext mleapContext = new ContextBuilder().createMleapContext();
BundleBuilder bundleBuilder = new BundleBuilder();
bundleBuilder.save(model, new File(
"jar:file:/usr/local/tomcat/mleap/model.zip"),
mleapContext);
Row row = result.first();
String output = "kpi_specific: " + row.get(0) + " - prediction: " + row.get(1);
我收到以下错误:
error: incompatible types: PipelineModel cannot be converted to Transformer
如何使transformer对象从加载的pipelinemodel开始?
谢谢。
暂无答案!
目前还没有任何答案,快来回答吧!