我用的是 H2ODRF
以及 H2OGridSearch
用随机离散网格搜索超参数优化建立随机森林管道模型。但是,当我将nfolds设置为任何大于1的数字并调用 fit()
,我得到一个错误。我的代码如下所示:
val drf = new H2ODRF()
.setFeaturesCols(featuresCols)
.setLabelCol(labelCol)
.setColumnsToCategorical(categoricalCols)
.setSplitRatio(splitRatio)
.setNfolds(4)
val nps = Map(
"ntrees" -> Array(10, 50).map(_.asInstanceOf[AnyRef]))
val search = new H2OGridSearch()
.setHyperParameters(hyperParams)
.setAlgo(drf)
val model = search.fit(data) // data is a Spark DataFrame
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 608096 path $.cross_validation_metrics_summary[0].data[0][0]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:224)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41)
at com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:72)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41)
at com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:72)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:129)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:220)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41)
at com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:72)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:129)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:220)
at com.google.gson.Gson.fromJson(Gson.java:887)
at com.google.gson.Gson.fromJson(Gson.java:852)
at com.google.gson.Gson.fromJson(Gson.java:801)
at ai.h2o.sparkling.backend.utils.RestCommunication$class.ai$h2o$sparkling$backend$utils$RestCommunication$$deserialize(RestCommunication.scala:164)
at ai.h2o.sparkling.backend.utils.RestCommunication$$anonfun$request$1.apply(RestCommunication.scala:147)
at ai.h2o.sparkling.backend.utils.RestCommunication$$anonfun$request$1.apply(RestCommunication.scala:145)
at ai.h2o.sparkling.utils.ScalaUtils$.withResource(ScalaUtils.scala:28)
at ai.h2o.sparkling.backend.utils.RestCommunication$class.request(RestCommunication.scala:145)
at ai.h2o.sparkling.ml.algos.H2OGridSearch.request(H2OGridSearch.scala:46)
at ai.h2o.sparkling.backend.utils.RestCommunication$class.query(RestCommunication.scala:54)
at ai.h2o.sparkling.ml.algos.H2OGridSearch.query(H2OGridSearch.scala:46)
at ai.h2o.sparkling.ml.algos.H2OGridSearch.getGridModels(H2OGridSearch.scala:129)
at ai.h2o.sparkling.ml.algos.H2OGridSearch.fit(H2OGridSearch.scala:163)
at ai.h2o.sparkling.ml.algos.H2OGridSearch.fit(H2OGridSearch.scala:46)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:153)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
... 59 elided
Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 608096 path $.cross_validation_metrics_summary[0].data[0][0]
at com.google.gson.stream.JsonReader.beginObject(JsonReader.java:385)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:213)
... 90 more
这个错误似乎是由错误引起的 cross_validation_metrics_summary
仅当nfolds大于1时返回的字段。有解决这个问题的办法吗?
编辑:我使用的是前列腺数据和spark版本 2.4.4
,scala版本 2.11.12
,并使用以下起泡水版本 ai.h2o:sparkling-water-package_2.11:3.30.0.4-1-2.4
.
编辑:在浏览了闪闪发光的源代码之后,问题似乎开始出现在错误配置的模式中 GridSchemaV99
. 我是否应该更新一个设置/配置来查找不同的模式?
暂无答案!
目前还没有任何答案,快来回答吧!