我正在从以下位置运行标准flink docker项目:https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
作为swarm一部分的机器位于不同的云中:azure和google云。
下面是复制的步骤。
创建群: docker swarm init --advertise-addr XXXXXX
创建注册表: docker service create --name registry --publish 5000:5000 registry:2
使用上面的worker令牌将所有机器添加到swarm。 docker node ls
将所有机器显示为“就绪”。
将图像推送到注册表: docker-compose push
将flink服务部署到swarm: docker stack deploy --compose-file docker-compose.yml flink
scale flink服务: docker service scale flink_taskmanager=20
继续检查 docker service ps flink_taskmanager | grep Running
docker swarm将尝试在所有计算机中启动flink\ u taskmanager,但与运行flink\ u jobmanager的容器不在同一虚拟网络/子网中的计算机将失败,错误如下:
2017-10-31 22:37:32,255 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:flink (auth:SIMPLE) cause:java.net.UnknownHostException: jobmanager: Name or service not known
2017-10-31 22:37:32,256 ERROR org.apache.flink.runtime.taskmanager.TaskManager - Failed to run TaskManager.
java.net.UnknownHostException: jobmanager: Name or service not known
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:173)
at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:138)
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:78)
at org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1663)
at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1574)
at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1572)
at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1572)
at org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala)
这是在我扩展任务管理器之后的几秒钟(在它们失败之前):
thalita@ubuntu-swarm-manager:~/flink/flink-contrib/docker-flink$ docker service ps flink_taskmanager | grep Running
ktcy4ujro1yo flink_taskmanager.1 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 4 seconds ago
qbpjoua6ctbg flink_taskmanager.2 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 12 seconds ago
ymlripufi9qe flink_taskmanager.3 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 16 seconds ago
xvfcqj2cnnph flink_taskmanager.4 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 15 seconds ago
lwvkkz3mx7ij flink_taskmanager.6 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 7 seconds ago
wrb78346dvmg flink_taskmanager.7 flink:1.3.2-hadoop2-scala_2.10 google-cloud-worker-1 Running Running 5 seconds ago
m31bf1cenevj flink_taskmanager.8 flink:1.3.2-hadoop2-scala_2.10 google-cloud-worker-1 Running Running 5 seconds ago
oe2ff8ijuer4 flink_taskmanager.9 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 11 seconds ago
vuw3dxugyjyi flink_taskmanager.10 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 15 seconds ago
xhmdbi9jad86 flink_taskmanager.11 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 10 seconds ago
o3tw38bok4b9 flink_taskmanager.12 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 10 minutes ago
knc54g7ayp1g flink_taskmanager.13 flink:1.3.2-hadoop2-scala_2.10 google-cloud-worker-1 Running Running 7 seconds ago
bqio2ubvik5j flink_taskmanager.14 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 6 seconds ago
qauubxm3msda flink_taskmanager.15 flink:1.3.2-hadoop2-scala_2.10 google-cloud-worker-1 Running Running 5 seconds ago
v9hjfadfn9y6 flink_taskmanager.16 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 4 seconds ago
d8oh7ol4g90y flink_taskmanager.17 flink:1.3.2-hadoop2-scala_2.10 google-cloud-worker-1 Running Running 3 seconds ago
9d4m7bb1bprp flink_taskmanager.18 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 10 seconds ago
ri00r8ehvwsh flink_taskmanager.19 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 5 seconds ago
几秒钟后,只有azure运行:
docker service ps flink_taskmanager | grep Running
ktcy4ujro1yo flink_taskmanager.1 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
qbpjoua6ctbg flink_taskmanager.2 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
ymlripufi9qe flink_taskmanager.3 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 2 minutes ago
xvfcqj2cnnph flink_taskmanager.4 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 2 minutes ago
5efusat5ay60 flink_taskmanager.5 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 2 minutes ago
lwvkkz3mx7ij flink_taskmanager.6 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
v2vndema8k74 flink_taskmanager.7 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 2 minutes ago
l92tjj0498v2 flink_taskmanager.8 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
oe2ff8ijuer4 flink_taskmanager.9 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
vuw3dxugyjyi flink_taskmanager.10 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 2 minutes ago
xhmdbi9jad86 flink_taskmanager.11 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
o3tw38bok4b9 flink_taskmanager.12 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 12 minutes ago
6rlm2pu2gn21 flink_taskmanager.13 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 2 minutes ago
bqio2ubvik5j flink_taskmanager.14 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 2 minutes ago
63r9kmrh46gw flink_taskmanager.15 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
v9hjfadfn9y6 flink_taskmanager.16 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 2 minutes ago
vmrf20o9eo5m flink_taskmanager.17 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running 2 minutes ago
9d4m7bb1bprp flink_taskmanager.18 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
ri00r8ehvwsh flink_taskmanager.19 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
8h21y4r49scb flink_taskmanager.20 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-2 Running Running 2 minutes ago
也就是作业管理器运行的位置:
docker service ps flink_jobmanager | grep Running
p6bzg567ewhn flink_jobmanager.1 flink:1.3.2-hadoop2-scala_2.10 azure-swarm-worker-1 Running Running about an hour ago
这是我从中使用create-docker-swarm-service.sh脚本时的docker日志https://github.com/apache/flink/tree/master/flink-contrib/docker-flink 要创建服务:
Starting Task Manager
config file:
jobmanager.rpc.address: flink-jobmanager
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 2
taskmanager.memory.preallocate: false
parallelism.default: 1
jobmanager.web.port: 8081
blob.server.port: 6124
query.server.port: 6125
Starting taskmanager as a console application on host c42a6093f7bb.
2017-11-01 11:20:51,459 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - --------------------------------------------------------------------------------
2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @ 10:23:11 UTC)
2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - Current user: flink
2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15
2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum heap size: 1024 MiBytes
2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - JAVA_HOME: /docker-java-home/jre
2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - Hadoop version: 2.7.2
2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM Options:
2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:+UseG1GC
2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xms1024M
2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xmx1024M
2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:MaxDirectMemorySize=8388607T
2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - Program Arguments:
2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - --configDir
2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - /opt/flink/conf
2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - Classpath: /opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar:::
2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - --------------------------------------------------------------------------------
2017-11-01 11:20:51,528 INFO org.apache.flink.runtime.taskmanager.TaskManager - Registered UNIX signal handlers for [TERM, HUP, INT]
2017-11-01 11:20:51,532 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum number of open file descriptors is 1048576
2017-11-01 11:20:51,548 INFO org.apache.flink.runtime.taskmanager.TaskManager - Loading configuration from /opt/flink/conf
2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink-jobmanager
2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024
2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024
2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2
2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false
2017-11-01 11:20:51,552 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2017-11-01 11:20:51,552 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081
2017-11-01 11:20:51,552 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124
2017-11-01 11:20:51,553 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125
2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink-jobmanager
2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024
2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024
2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2
2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false
2017-11-01 11:20:51,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2017-11-01 11:20:51,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081
2017-11-01 11:20:51,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124
2017-11-01 11:20:51,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125
2017-11-01 11:20:51,585 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to flink (auth:SIMPLE)
2017-11-01 11:20:51,621 ERROR org.apache.flink.runtime.taskmanager.TaskManager - Failed to run TaskManager.
java.net.UnknownHostException: flink-jobmanager: Name or service not known
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:173)
at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:138)
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:78)
at org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1663)
at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1574)
at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1572)
at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1572)
at org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala)
暂无答案!
目前还没有任何答案,快来回答吧!