我一直试图让斯巴克成功。阅读以前的问题,博客,但还没能使它工作。
首先我在安装sparkr时遇到了问题,最后我想我安装了它,但之后无法运行它。下面是我的详细代码,其中包含我尝试运行的不同选项。
当前使用R3.6.0版本的rstudio。任何帮助都将不胜感激!!
# ***************************#
# Installing Spark Option 1
# ***************************#
install.packages("SparkR")
'''
Does not work
'''
Sys.setenv("JAVA_HOME" = "D:/Program Files/Java/jdk1.8.0_181")
Sys.getenv("JAVA_HOME")
# ***************************#
# Installing Spark Option 2
# ***************************#
# Find Spark Versions
jsonlite::fromJSON("https://api.github.com/repos/apache/spark/tags")$name
if (!require('devtools')) install.packages('devtools')
devtools::install_github('apache/spark@v2.4.6', subdir='R/pkg')
Sys.setenv(SPARK_HOME='D:/spark-2.3.1-bin-hadoop2.7')
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R', 'lib'), .libPaths()))
'''
Installation didnt work
'''
# ***************************#
# Installation Spark Option 3
# ***************************#
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "2.3.1")
install.packages("https://cran.r-project.org/src/contrib/Archive/SparkR/SparkR_2.3.0.tar.gz", repos = NULL, type="source")
library(SparkR)
'''
One of 2 installations worked
'''
# ***************************#
# Starting Spark Option 1
# ***************************#
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R","lib")))
sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g"))
'''
Spark package found in SPARK_HOME: D:/spark-2.3.1-bin-hadoop2.7
Launching java with spark-submit command D:/spark-2.3.1-bin-hadoop2.7/bin/spark-submit2.cmd --driver-memory "2g" sparkr-shell C:\Users\FELIPE~1\AppData\Local\Temp\RtmpKOxYkx\backend_port34a0263f43f5
Error in if (len > 0) { : argumento tiene longitud cero
'''
# ***************************#
# Starting Spark Option 2
# ***************************#
Sys.setenv("JAVA_HOME" = "D:/Program Files/Java/jdk1.8.0_181")
Sys.getenv("JAVA_HOME")
sparkEnvir <- list(spark.num.executors='5', spark.executor.cores='5')
# initializing Spark context
sc <- sparkR.init(sparkHome = "'D:/spark-2.3.1-bin-hadoop2.7'",
sparkEnvir = sparkEnvir)
'''
Error in sparkR.sparkContext(master, appName, sparkHome, convertNamedListToEnv(sparkEnvir), :
JVM is not ready after 10 seconds
Además: Warning message:
sparkR.init is deprecated.
Use sparkR.session instead.
See help("Deprecated")
'''
# ***************************#
# Starting Spark Option 3
# ***************************#
Sys.setenv("JAVA_HOME" = "D:/Program Files/Java/jdk1.8.0_181")
Sys.getenv("JAVA_HOME")
sparkEnvir <- list(spark.num.executors='5', spark.executor.cores='5')
# initializing Spark context
sc <- sparkR.session(sparkHome = "'D:/spark-2.3.1-bin-hadoop2.7'",
sparkEnvir = sparkEnvir)
'''
Spark not found in SPARK_HOME: D:/spark-2.3.1-bin-hadoop2.7
Spark package found in SPARK_HOME: D:/spark-2.3.1-bin-hadoop2.7
Launching java with spark-submit command D:/spark-2.3.1-bin-hadoop2.7/bin/spark-submit2.cmd sparkr-shell C:\Users\FELIPE~1\AppData\Local\Temp\RtmpKOxYkx\backend_port34a082b15e1
Error in if (len > 0) { : argumento tiene longitud cero
'''
暂无答案!
目前还没有任何答案,快来回答吧!