我已经编写了spark(版本2.4.1)-scala代码来读取awsElasticSearch(版本6.7)索引数据。我使用的是intellijide,当我创建fat jar时,jar在本地机器上正确运行,但当我在ec2上运行同一jar时,它会抛出错误(ec2和本地机器上安装了相同的spark版本)
命令:
spark submit--class“com.std.sparkelk”--master local[*]“sparkelkdata-assembly-0.1.jar”
错误是:-线程“main”org.elasticsearch.hadoop.eshadoPillegalArgumentException中出现异常:无法检测es版本-通常在网络/elasticsearch群集不可访问或在没有正确设置“es.nodes.wan.only”的情况下以wan/cloud示例为目标时会发生这种情况
ElasticSearch和ec2之间没有连接问题。我猜有一些依赖性问题没有添加到jar中,但是为什么jar在本地机器上成功运行而在ec2上失败呢
Scala code :-
import org.apache.spark.sql.SparkSession
object SparkElk {
def main(args:Array[String])={
val spark = SparkSession.builder.appName("Elk Data Processing")
.master("local")
.config("fs.s3a.access.key", "access_key")
.config("fs.s3a.secret.key", "secret_key")
.config("fs.s3a.endpoint", "s3.ap-south-1.amazonaws.com")
.config("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
.config("es.nodes","elasticsearch url")
.config("es.port","443")
.config("es.nodes.wan.only", "true")
.config("es.http.timeout", "5m")
.config("es.net.ssl","true")
.getOrCreate()
val rawDataDf = spark.sqlContext.read.format("es").load(index_name)
rawDataDf.show(1,false)
spark.stop()
}
}
build.sbt:-
name := "SparkElkData"
version := "0.1"
scalaVersion := "2.11.11"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.1"
libraryDependencies += "org.elasticsearch" %% "elasticsearch-spark-20" % "6.7.0"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case "git.properties" => MergeStrategy.last
case x => MergeStrategy.first
}
mainClass in assembly := Some("com.std.SparkElk")
fullClasspath in Runtime := (fullClasspath in (Compile, run)).value
assembly.sbt:-
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")
暂无答案!
目前还没有任何答案,快来回答吧!