我在ha模式下配置了flink,如下所述:
我想测试容错性,因此我做了以下工作:
使用2个作业管理器和1个任务管理器设置flink群集
在任务管理器上启动流作业
杀死活动作业管理器(模拟崩溃)
领导人选举正如期举行。
但是任务管理器会重新连接到新的作业管理器。它只是尝试每10秒重新连接到前一个领导者。
在此处粘贴任务管理器日志:
2018-07-25 19:46:08,508 INFO org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have a max timeout of 10000 ms
2018-07-25 19:46:08,515 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/taskmanager_0 .
2018-07-25 19:46:08,524 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2018-07-25 19:46:08,525 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Start job leader service.
2018-07-25 19:46:08,529 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting to ResourceManager akka.tcp://flink@10.10.97.210:46477/user/resourcemanager(b91b9aeb3565be973c9bb47259414e0a).
2018-07-25 19:46:08,574 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /10.10.97.210:46477
2018-07-25 19:46:08,576 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@10.10.97.210:46477] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@10.10.97.210:46477]] Caused by: [Connection refused: /10.10.97.210:46477]
2018-07-25 19:46:08,579 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@10.10.97.210:46477/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.10.97.210:46477/user/resourcemanager..
2018-07-25 19:46:18,606 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /10.10.97.210:46477
2018-07-25 19:46:18,607 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@10.10.97.210:46477] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@10.10.97.210:46477]] Caused by: [Connection refused: /10.10.97.210:46477]
2018-07-25 19:46:18,607 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@10.10.97.210:46477/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.10.97.210:46477/user/resourcemanager..
重新启动任务管理器没有帮助
重新启动群集没有帮助
如果有什么东西不见了,请引导我。
1条答案
按热度按时间3gtaxfhh1#
查看日志:
拒绝连接:/10.10.97.210:46477
端口46477是否已从防火墙中打开/排除?
只需检查是否在flink config中设置了以下内容:
然后打开这些端口。