当我使用 sbt run
当配置指向远程集群的主节点时,worker不会执行任何有用的操作,下面的警告将显示在 sbt run
重复记录。
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
我的spark配置就是这样的:
@transient lazy val conf: SparkConf = new SparkConf()
.setMaster("spark://master-ip:7077")
.setAppName("HelloWorld")
.set("spark.executor.memory", "1g")
.set("spark.driver.memory", "12g")
@transient lazy val sc: SparkContext = new SparkContext(conf)
val lines = sc.textFile("hdfs://master-public-dns:9000/test/1000.csv")
我知道这个警告通常出现在集群配置错误的情况下,并且工人要么没有资源,要么根本就没有启动。然而,根据我的spark ui(在主ip:8080上),工作节点似乎还活着,有足够的ram和cpu内核,它们甚至试图执行我的应用程序,但它们退出并将其留在内存中 stderr
日志:
INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled;
users with view permissions: Set(ubuntu, myuser);
groups with view permissions: Set(); users with modify permissions: Set(ubuntu, myuser); groups with modify permissions: Set()
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
...
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 192.168.0.11:35996 in 120 seconds
... 8 more
ERROR RpcOutboxMessage: Ask timeout before connecting successfully
有什么想法吗?
1条答案
按热度按时间oxcyiej71#
无法在120秒内收到192.168.0.11:35996的任何回复
你能从worker telnet到这个ip上的这个端口吗,也许你的驱动程序有多个网络接口,试着在$spark\u home/conf/spark-env.sh中设置spark\u local\u ip