org.apache.hadoop.net.connecttimeoutexception:从本地\u计算机到hadoop \u服务器的调用在套接字超时异常时失败

jaxagkaj  于 2021-05-27  发布在  Hadoop
关注(0)|答案(0)|浏览(414)

我们正在尝试对存在于另一台服务器中的Yarn执行本地spark提交:
这里的想法是,我们试图将来自云容器中的jupyterlab的提交具体化为本地环境中的jupyterlb;我们将jupyterlab容器中存在的相同配置文件导出到本地环境中(在我们的例子中是膝上型计算机,带有允许访问的vpn)。
问题是,在云容器中,对yarn的提交会成功执行。

2020-03-18 12:45:15 INFO  RequestHedgingRMFailoverProxyProvider:89 - Created wrapped proxy for [rm1, rm2]
2020-03-18 12:45:15 INFO  AHSProxy:42 - Connecting to Application History server at server_address
2020-03-18 12:45:15 INFO  RequestHedgingRMFailoverProxyProvider:147 - Looking for the active RM in [rm1, rm2]...
2020-03-18 12:45:15 INFO  RequestHedgingRMFailoverProxyProvider:171 - Found active RM [rm2]
2020-03-18 12:45:15 INFO  Client:54 - Requesting a new application from cluster with 348 NodeManagers
2020-03-18 12:45:15 INFO  Configuration:2752 - resource-types.xml not found
2020-03-18 12:45:15 INFO  ResourceUtils:418 - Unable to find 'resource-types.xml'.
2020-03-18 12:45:15 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (346687 MB per container)
2020-03-18 12:45:15 INFO  Client:54 - Will allocate AM container, with 61952 MB memory including 5632 MB overhead
2020-03-18 12:45:15 INFO  Client:54 - Setting up container launch context for our AM
2020-03-18 12:45:15 INFO  Client:54 - Setting up the launch environment for our AM container
2020-03-18 12:45:15 INFO  Client:54 - Preparing resources for our AM container
2020-03-18 12:45:15 INFO  HadoopFSDelegationTokenProvider:54 - getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1993770491_1, ugi=id(auth:KERBEROS)]]
2020-03-18 12:45:15 INFO  DFSClient:703 - Created token for owner: HDFS_DELEGATION_TOKEN owner=id, renewer=yarn, realUser=, issueDate=1584535515764, maxDate=1585140315764, sequenceNumber=31968916, masterKeyId=874 on ha-hdfs:edwbiprdmil
2020-03-18 12:45:16 INFO  metastore:377 - Trying to connect to metastore with URI server_uri
2020-03-18 12:45:16 INFO  metastore:473 - Connected to metastore.

最后,我们连接到元存储。
现在,我们在本地机器上执行相同的脚本,经过一系列配置之后,

!spark-submit --conf spark.authenticate=True --conf spark.sql.hive.manageFilesourcePartitions=False \
    --name test_spark --master yarn --deploy-mode cluster --driver-memory 55g --num-executors 20 \
    --executor-memory 24g --executor-cores 30 --queue ADHOC_TACTICAL --conf spark.dynamicAllocation.enabled=true \
    --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.maxExecutors=100 \
    --conf spark.dynamicAllocation.minExecutors=10 --conf spark.dynamicAllocation.executorIdleTimeout=240s \
    --conf spark.yarn.tags=tag submit.py

执行结果如下:

2020-03-18 13:08:48,677] WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (org.apache.hadoop.util.NativeCodeLoader)
[2020-03-18 13:08:51,835] WARN The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. (org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory)
[2020-03-18 13:08:52,590] INFO Timeline service address: server/ws/v1/timeline/ (org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl)
[2020-03-18 13:09:22,519] INFO Failing over to rm2 (org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider)
[2020-03-18 13:09:48,026] INFO Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 36102ms. (org.apache.hadoop.io.retry.RetryInvocationHandler)
org.apache.hadoop.net.ConnectTimeoutException: Call From local to server failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=server:8032]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy30.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy31.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:165)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:165)
    at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
    at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:60)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:164)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1135)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1527)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=dcmipphm13002.edc.nam.gm.com/10.125.2.20:8032]
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 26 more

有什么想法吗?我们考虑过一个防火墙来阻止我们的连接,或者一个允许服务器接受来自本地机器的请求所需的权限,但是你们中的任何人都面临同样的问题吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题