我正在尝试在ec2上的aws ecs上运行spark 3.0。我有spark worker服务和spark master服务。当我尝试用主机名运行worker(通过ecs服务发现公开)时,它无法解析。当我把硬编码的ip地址/端口,它的工作。
以下是我在ssh'ed到支持ecs的ec2之后在worker docker容器中运行的一些命令:
# as can be seen below, the master host is reachable from the worker Docker container
root@b87fad6a3ffa:/usr/spark-3.0.0# ping spark_master.mynamespace
PING spark_master.mynamespace (172.21.60.11) 56(84) bytes of data.
64 bytes from ip-172-21-60-11.eu-west-1.compute.internal (172.21.60.11): icmp_seq=1 ttl=254 time=0.370 ms
# the following works just fine -- starting the worker successfully and connecting to the master:
root@b87fad6a3ffa:/usr/spark-3.0.0# /bin/sh -c "bin/spark-class org.apache.spark.deploy.worker.Worker spark://172.21.60.11:7077"
# !!! this is the fail
root@b87fad6a3ffa:/usr/spark-3.0.0# /bin/sh -c "bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark_master.mynamespace:7077"
20/07/01 21:03:41 INFO worker.Worker: Started daemon with process name: 422@b87fad6a3ffa
20/07/01 21:03:41 INFO util.SignalUtils: Registered signal handler for TERM
20/07/01 21:03:41 INFO util.SignalUtils: Registered signal handler for HUP
20/07/01 21:03:41 INFO util.SignalUtils: Registered signal handler for INT
20/07/01 21:03:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/07/01 21:03:42 INFO spark.SecurityManager: Changing view acls to: root
20/07/01 21:03:42 INFO spark.SecurityManager: Changing modify acls to: root
20/07/01 21:03:42 INFO spark.SecurityManager: Changing view acls groups to:
20/07/01 21:03:42 INFO spark.SecurityManager: Changing modify acls groups to:
20/07/01 21:03:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
20/07/01 21:03:42 INFO util.Utils: Successfully started service 'sparkWorker' on port 39915.
20/07/01 21:03:42 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
org.apache.spark.SparkException: Invalid master URL: spark://spark_master.mynamespace:7077
at org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2397)
at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47)
at org.apache.spark.deploy.worker.Worker$.$anonfun$startRpcEnvAndEndpoint$3(Worker.scala:859)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:859)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:828)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
20/07/01 21:03:42 INFO util.ShutdownHookManager: Shutdown hook called
# following is just FYI
root@b87fad6a3ffa:/usr/spark-3.0.0# /bin/sh -c "bin/spark-class org.apache.spark.deploy.worker.Worker --help"
20/07/01 21:16:10 INFO worker.Worker: Started daemon with process name: 552@b87fad6a3ffa
20/07/01 21:16:10 INFO util.SignalUtils: Registered signal handler for TERM
20/07/01 21:16:10 INFO util.SignalUtils: Registered signal handler for HUP
20/07/01 21:16:10 INFO util.SignalUtils: Registered signal handler for INT
Usage: Worker [options] <master>
Master must be a URL of the form spark://hostname:port
Options:
-c CORES, --cores CORES Number of cores to use
-m MEM, --memory MEM Amount of memory to use (e.g. 1000M, 2G)
-d DIR, --work-dir DIR Directory to run apps in (default: SPARK_HOME/work)
-i HOST, --ip IP Hostname to listen on (deprecated, please use --host or -h)
-h HOST, --host HOST Hostname to listen on
-p PORT, --port PORT Port to listen on (default: random)
--webui-port PORT Port for web UI (default: 8081)
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.
...
主节点本身工作正常,我可以通过8080等看到它的管理用户界面。
你知道为什么spark不解析主机名而只处理ip地址吗?
1条答案
按热度按时间4nkexdtk1#
问题是关于
_
我在主机名中使用的。当我改变的时候spark_master
以及spark_worker
使用-
相反,问题解决了。相关链接:
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6587184
uri-gethost返回null。为什么?
spark代码库中的相关代码段: