docker flink:任务管理器在docker swarm的不同节点中找不到jobmanager

moiiocjp  于 2021-06-25  发布在  Flink
关注(0)|答案(0)|浏览(317)

即使节点在同一子网中,也会发生这种情况。
我正在使用docker flink项目:https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
我正在使用以下命令创建服务:

docker network create -d overlay overlay 
docker service create --name jobmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager -p 8081:8081 --network overlay --constraint 'node.hostname == ubuntu-swarm-manager' flink jobmanager 
docker service create --name taskmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager --network overlay --constraint 'node.hostname != ubuntu-swarm-manager' flink taskmanager

这是我得到的错误:

- Trying to register at JobManager akka.tcp://flink@jobmanager:6123/   user/jobmanager (attempt 4, timeout: 4000 milliseconds)

以下是我的环境配置:
节点:ubuntu swarm master azure vm standard d4s v3(4 vCPU,16 gb内存)docker版本17.03.1-ce,内部版本c6d412e
节点:azure-swarm-worker-1 azure vm standard d2 v2 promo(2个vCPU,7 gb内存)docker版本17.09.0-ce,构建afdb6d4
flink:使用image 1.3.2-hadoop2-scala\u 2.10
这来自运行taskmanager的容器的日志:
开始正常。。。

Starting Task Manager
config file:
jobmanager.rpc.address: jobmanager
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 2
taskmanager.memory.preallocate: false
parallelism.default: 1
jobmanager.web.port: 8081
blob.server.port: 6124
query.server.port: 6125
Starting taskmanager as a console application on host 00afd4130a94.

然后出现一些错误(向右滚动):

2017-11-02 14:06:51,064 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to select the network interface and address to use by connecting to the leading JobManager.
    2017-11-02 14:06:51,065 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics
    2017-11-02 14:06:51,067 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Retrieved new target address jobmanager/10.0.0.2:6123.
    2017-11-02 14:06:54,578 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address jobmanager/10.0.0.2:6123
    2017-11-02 14:06:54,779 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '00afd4130a94/10.0.0.5': connect timed out
    2017-11-02 14:06:54,829 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.5': connect timed out
    2017-11-02 14:06:54,880 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.4': connect timed out
    2017-11-02 14:06:54,931 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.18.0.3': connect timed out
    2017-11-02 14:06:54,981 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.5': connect timed out
    2017-11-02 14:06:55,031 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.4': connect timed out
    2017-11-02 14:06:55,032 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed)
    2017-11-02 14:06:56,034 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.18.0.3': connect timed out
    2017-11-02 14:06:57,036 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.5': connect timed out
    2017-11-02 14:06:58,037 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.4': connect timed out
    2017-11-02 14:06:58,038 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed)
    2017-11-02 14:06:58,138 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address jobmanager/10.0.0.2:6123
    2017-11-02 14:06:58,339 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '00afd4130a94/10.0.0.5': connect timed out
    2017-11-02 14:06:58,389 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.5': connect timed out
    2017-11-02 14:06:58,439 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.4': connect timed out
    2017-11-02 14:06:58,490 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.18.0.3': connect timed out
    2017-11-02 14:06:58,541 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.5': connect timed out
    2017-11-02 14:06:58,592 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.4': connect timed out
    2017-11-02 14:06:58,592 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed)
    2017-11-02 14:06:59,593 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.18.0.3': connect timed out
    2017-11-02 14:07:00,595 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.5': connect timed out
    2017-11-02 14:07:01,599 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.0.0.4': connect timed out
    2017-11-02 14:07:01,599 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed)
    2017-11-02 14:07:01,600 WARN  org.apache.flink.runtime.net.ConnectionUtils                  - Could not connect to jobmanager/10.0.0.2:6123. Selecting a local address using heuristics.
    2017-11-02 14:07:01,601 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager will use hostname/address '00afd4130a94' (10.0.0.5) for communication.
    2017-11-02 14:07:01,601 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Starting TaskManager
    2017-11-02 14:07:01,601 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Starting TaskManager actor system at 00afd4130a94:0.
    2017-11-02 14:07:01,947 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
    2017-11-02 14:07:01,978 INFO  Remoting                                                      - Starting remoting
    2017-11-02 14:07:02,168 INFO  Remoting                                                      - Remoting started; listening on addresses :[akka.tcp://flink@00afd4130a94:33881]
    2017-11-02 14:07:02,174 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Starting TaskManager actor
    2017-11-02 14:07:02,192 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig         - NettyConfig [server address: 00afd4130a94/10.0.0.5, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 2 (manual), number of client threads: 2 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
    2017-11-02 14:07:02,199 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration  - Messages have a max timeout of 10000 ms
    2017-11-02 14:07:02,201 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Temporary file directory '/tmp': total 29 GB, usable 25 GB (86.21% usable)
    2017-11-02 14:07:02,286 INFO  org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated 101 MB for network buffer pool (number of memory segments: 3260, bytes per segment: 32768).
    2017-11-02 14:07:02,393 INFO  org.apache.flink.runtime.io.network.NetworkEnvironment        - Starting the network environment and its components.
    2017-11-02 14:07:02,400 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful initialization (took 2 ms).
    2017-11-02 14:07:02,434 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful initialization (took 32 ms). Listening on SocketAddress /10.0.0.5:42921.
    2017-11-02 14:07:02,493 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Limiting managed memory to 0.7 of the currently free heap space (640 MB), memory will be allocated lazily.
    2017-11-02 14:07:02,498 INFO  org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager uses directory /tmp/flink-io-e57d51fa-2269-4df0-9910-0fe26c6042bd for spill files.
    2017-11-02 14:07:02,501 INFO  org.apache.flink.runtime.metrics.MetricRegistry               - No metrics reporter configured, no metrics will be exposed/reported.
    2017-11-02 14:07:02,553 INFO  org.apache.flink.runtime.filecache.FileCache                  - User file cache uses directory /tmp/flink-dist-cache-2c0c063f-464e-48f1-9fb8-fcfa48868e3a
    2017-11-02 14:07:02,564 INFO  org.apache.flink.runtime.filecache.FileCache                  - User file cache uses directory /tmp/flink-dist-cache-0c5e2b25-70a2-4964-9eec-24b0e79d560e
    2017-11-02 14:07:02,572 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Starting TaskManager actor at akka://flink/user/taskmanager#1719715507.
    2017-11-02 14:07:02,572 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager data connection information: df5992297d269fa16a5e945e1dce0451 @ 00afd4130a94 (dataPort=42921)
    2017-11-02 14:07:02,573 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager has 2 task slot(s).
    2017-11-02 14:07:02,574 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Memory usage stats: [HEAP: 113/1024/1024 MB, NON HEAP: 33/33/-1 MB (used/committed/max)]
    2017-11-02 14:07:02,576 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds)
    2017-11-02 14:07:03,106 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 2, timeout: 1000 milliseconds)
    2017-11-02 14:07:04,126 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 3, timeout: 2000 milliseconds)

以下是运行jobmanager的容器的日志:

Starting Job Manager
config file:
jobmanager.rpc.address: jobmanager
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 1
taskmanager.memory.preallocate: false
parallelism.default: 1
jobmanager.web.port: 8081
blob.server.port: 6124
query.server.port: 6125
Starting jobmanager as a console application on host c30e0fe7b765.
2017-11-02 13:42:33,721 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-11-02 13:42:33,796 INFO  org.apache.flink.runtime.jobmanager.JobManager                - --------------------------------------------------------------------------------
2017-11-02 13:42:33,796 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Starting JobManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @ 10:23:11 UTC)
2017-11-02 13:42:33,796 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Current user: flink
2017-11-02 13:42:33,796 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15
2017-11-02 13:42:33,796 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Maximum heap size: 981 MiBytes
2017-11-02 13:42:33,796 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  JAVA_HOME: /docker-java-home/jre
2017-11-02 13:42:33,799 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Hadoop version: 2.7.2
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  JVM Options:
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     -Xms1024m
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     -Xmx1024m
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Program Arguments:
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     --configDir
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     /opt/flink/conf
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     --executionMode
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     cluster
2017-11-02 13:42:33,800 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Classpath: /opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar:::
2017-11-02 13:42:33,801 INFO  org.apache.flink.runtime.jobmanager.JobManager                - --------------------------------------------------------------------------------
2017-11-02 13:42:33,801 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Registered UNIX signal handlers for [TERM, HUP, INT]
2017-11-02 13:42:33,911 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Loading configuration from /opt/flink/conf
2017-11-02 13:42:33,914 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, jobmanager
2017-11-02 13:42:33,915 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2017-11-02 13:42:33,915 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.mb, 1024
2017-11-02 13:42:33,915 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.mb, 1024
2017-11-02 13:42:33,915 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2017-11-02 13:42:33,915 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.memory.preallocate, false
2017-11-02 13:42:33,916 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2017-11-02 13:42:33,916 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.web.port, 8081
2017-11-02 13:42:33,917 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: blob.server.port, 6124
2017-11-02 13:42:33,917 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: query.server.port, 6125
2017-11-02 13:42:33,924 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager without high-availability
2017-11-02 13:42:33,926 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager on jobmanager:6123 with execution mode CLUSTER
2017-11-02 13:42:33,934 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, jobmanager
2017-11-02 13:42:33,934 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2017-11-02 13:42:33,934 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.mb, 1024
2017-11-02 13:42:33,934 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.mb, 1024
2017-11-02 13:42:33,935 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2017-11-02 13:42:33,935 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.memory.preallocate, false
2017-11-02 13:42:33,935 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2017-11-02 13:42:33,935 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.web.port, 8081
2017-11-02 13:42:33,936 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: blob.server.port, 6124
2017-11-02 13:42:33,936 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: query.server.port, 6125
2017-11-02 13:42:33,962 INFO  org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set to flink (auth:SIMPLE)
2017-11-02 13:42:34,026 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager actor system reachable at jobmanager:6123
2017-11-02 13:42:34,290 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2017-11-02 13:42:34,327 INFO  Remoting                                                      - Starting remoting
2017-11-02 13:42:34,505 INFO  Remoting                                                      - Remoting started; listening on addresses :[akka.tcp://flink@jobmanager:6123]
2017-11-02 13:42:34,524 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager web frontend
2017-11-02 13:42:34,532 WARN  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Log file environment variable 'log.file' is not set.
2017-11-02 13:42:34,532 WARN  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'jobmanager.web.log.path'.
2017-11-02 13:42:34,532 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Using directory /tmp/flink-web-9f0ba581-3488-4086-a79c-53e17b56352c for the web interface files
2017-11-02 13:42:34,533 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Using directory /tmp/flink-web-17a58ccf-7d8b-475e-b727-4a7935a19c0f for web frontend JAR file uploads
2017-11-02 13:42:34,741 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Web frontend listening at 0:0:0:0:0:0:0:0:8081
2017-11-02 13:42:34,741 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager actor
2017-11-02 13:42:34,751 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created BLOB server storage directory /tmp/blobStore-d10b620a-73ae-40af-bd23-aad5211fe1cc
2017-11-02 13:42:34,752 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started BLOB server at 0.0.0.0:6124 - max concurrent requests: 50 - max backlog: 1000
2017-11-02 13:42:34,763 INFO  org.apache.flink.runtime.metrics.MetricRegistry               - No metrics reporter configured, no metrics will be exposed/reported.
2017-11-02 13:42:34,769 INFO  org.apache.flink.runtime.jobmanager.MemoryArchivist           - Started memory archivist akka://flink/user/archive
2017-11-02 13:42:34,774 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Starting with JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager on port 8081
2017-11-02 13:42:34,774 INFO  org.apache.flink.runtime.webmonitor.JobManagerRetriever       - New leader reachable under akka.tcp://flink@jobmanager:6123/user/jobmanager:00000000-0000-0000-0000-000000000000.
2017-11-02 13:42:34,776 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager at akka.tcp://flink@jobmanager:6123/user/jobmanager.
2017-11-02 13:42:34,785 INFO  org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager  - Trying to associate with JobManager leader akka.tcp://flink@jobmanager:6123/user/jobmanager
2017-11-02 13:42:34,801 INFO  org.apache.flink.runtime.jobmanager.JobManager                - JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager was granted leadership with leader session ID Some(00000000-0000-0000-0000-000000000000).
2017-11-02 13:42:34,814 INFO  org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager  - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#844712453] - leader session 00000000-0000-0000-0000-000000000000

为什么任务经理不能和工作经理谈谈?不知道是不是缺了什么配置。任何帮助都将不胜感激。非常感谢你!

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题