在windows中使用sparkr构建glm模型,但是速度非常慢并且在执行r代码时出错

1mrurvl1  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(230)

数据集很大,包含30列和200000条记录。我正在使用sparkr构建glm模型,但是模型执行花费了太多时间,而且也会出错。。如何使用sparkr减少模型构建时间并解决下面给出的这个错误。请给我一些改进代码的建议。
r代码:设置spark home

Sys.setenv(SPARK_HOME="C:/spark/spark-2.0.0-bin-hadoop2.7")

设置库路径

.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"), .libPaths()))

Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk1.7.0_71")

加载sparkr库

library(SparkR)
library(rJava)

sc <- sparkR.session(enableHiveSupport = FALSE,master = "local[*]",appName = "SparkR-Modi",sparkConfig = list(spark.sql.warehouse.dir="file:///c:/tmp/spark-warehouse"))
sqlContext <- sparkRSQL.init(sc)
spdf <- read.df(sqlContext, "C:/Users/prasann/Desktop/V/bigdata11.csv", source = "com.databricks.spark.csv", header = "true")
showDF(spdf)

glm模型

md <- glm(NP_OfferCurrentResponse ~., family = "binomial", data = spdf)

错误:(模型执行非常慢,出现错误)

> md <- glm(NP_OfferCurrentResponse ~., family = "binomial", data = spdf)
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
java.lang.AssertionError: assertion failed: lapack.dppsv returned 226.
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:40)
at org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:140)
at org.apache.spark.ml.regression.GeneralizedLinearRegression$FamilyAndLink.initialize(GeneralizedLinearRegression.scala:340)
at org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:275)
at org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:139)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:145)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.c

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题