spark工作节点超时

nukf8bse 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(489)

当我使用 sbt run 当配置指向远程集群的主节点时，worker不会执行任何有用的操作，下面的警告将显示在 sbt run 重复记录。

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我的spark配置就是这样的：

@transient lazy val conf: SparkConf = new SparkConf()
    .setMaster("spark://master-ip:7077")
    .setAppName("HelloWorld")
    .set("spark.executor.memory", "1g")
    .set("spark.driver.memory", "12g")

@transient lazy val sc: SparkContext = new SparkContext(conf)

val lines   = sc.textFile("hdfs://master-public-dns:9000/test/1000.csv")

我知道这个警告通常出现在集群配置错误的情况下，并且工人要么没有资源，要么根本就没有启动。然而，根据我的spark ui（在主ip:8080上），工作节点似乎还活着，有足够的ram和cpu内核，它们甚至试图执行我的应用程序，但它们退出并将其留在内存中 stderr 日志：

INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; 
users  with view permissions: Set(ubuntu, myuser); 
groups with view permissions: Set(); users  with modify permissions: Set(ubuntu, myuser); groups with modify permissions: Set()

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
...
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 192.168.0.11:35996 in 120 seconds
... 8 more
ERROR RpcOutboxMessage: Ask timeout before connecting successfully

有什么想法吗？

hadoop scala apache-spark

来源：https://stackoverflow.com/questions/46516305/spark-worker-nodes-timeout