flink-timeoutexception:id为someid的taskmanager的心跳超时

cgyqldqp  于 2021-06-21  发布在  Flink
关注(0)|答案(0)|浏览(709)

虽然我在本地运行可执行jar(使用java-jar命令),但它与这个问题类似。我使用180g作为jvm堆的最大大小,在大约37分钟后,异常被抛出到大约110g的使用量

java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id someId timed out.

在上面的问题中提到“也许jobmanager的内存太小了”。如何将其转换为本地执行?只是我用xmx配置的内存吗,就像我在上面写的(180g)?
我在其他地方读到可以配置taskmanager的心跳间隔?在本地执行中我该怎么做?
(编辑)在哪里可以找到本地执行中任务管理器的日志?
顺便说一下,我用的是flink1.9
编辑的日志:

18:28:33.993 [flink-akka.actor.default-dispatcher-533] DEBUG o.a.f.r.r.StandaloneResourceManager - Trigger heartbeat request.
18:28:33.993 [flink-akka.actor.default-dispatcher-532] DEBUG o.a.f.runtime.jobmaster.JobMaster - Trigger heartbeat request.
18:28:33.993 [flink-akka.actor.default-dispatcher-533] DEBUG o.a.f.r.r.StandaloneResourceManager - Trigger heartbeat request.
18:28:33.993 [flink-akka.actor.default-dispatcher-532] DEBUG o.a.f.runtime.jobmaster.JobMaster - Received heartbeat request from 2ac3bc91e5eb2b7e41baf47d97764652.
18:28:33.993 [flink-akka.actor.default-dispatcher-530] DEBUG o.a.f.r.taskexecutor.TaskExecutor - Received heartbeat request from 966e2f0da4aa05748d6c7921fb02819e.
18:28:33.993 [flink-akka.actor.default-dispatcher-533] DEBUG o.a.f.r.r.StandaloneResourceManager - Received heartbeat from 966e2f0da4aa05748d6c7921fb02819e.
18:28:33.993 [flink-akka.actor.default-dispatcher-533] DEBUG o.a.f.runtime.jobmaster.JobMaster - Received heartbeat from 0895b99a-27b3-43da-8226-16031db924de.
18:28:33.993 [flink-akka.actor.default-dispatcher-530] DEBUG o.a.f.r.taskexecutor.TaskExecutor - Received heartbeat request from 2ac3bc91e5eb2b7e41baf47d97764652.
18:28:33.993 [flink-akka.actor.default-dispatcher-532] DEBUG o.a.f.r.r.StandaloneResourceManager - Received heartbeat from 0895b99a-27b3-43da-8226-16031db924de.
18:28:33.993 [flink-akka.actor.default-dispatcher-532] DEBUG o.a.f.r.r.s.SlotManagerImpl - Received slot report from instance 899db08270ab48abff5e3111979c6f07: SlotReport{slotsStatus=[SlotStatus{slotID=0895b99a-27b3-43da-8226-16031db924de_0, resourceProfile=ResourceProfile{cpuCores=1.7976931348623157E308, heapMemoryInMB=2147483647, directMemoryInMB=2147483647, nativeMemoryInMB=2147483647, networkMemoryInMB=2147483647, managedMemoryInMB=165639}, allocationID=8b473e13eadfa5d9b8e34e9020536b5e, jobID=6109483ea411f1c099d0bb18a5995742}]}.
18:29:44.070 [flink-akka.actor.default-dispatcher-534] DEBUG o.a.f.runtime.jobmaster.JobMaster - Trigger heartbeat request.
18:29:44.070 [flink-akka.actor.default-dispatcher-535] DEBUG o.a.f.r.r.StandaloneResourceManager - Trigger heartbeat request.
18:29:44.071 [flink-akka.actor.default-dispatcher-535] DEBUG o.a.f.r.r.StandaloneResourceManager - Trigger heartbeat request.
18:29:44.071 [flink-akka.actor.default-dispatcher-536] DEBUG o.a.f.r.taskexecutor.TaskExecutor - Received heartbeat request from 966e2f0da4aa05748d6c7921fb02819e.
18:29:44.071 [flink-akka.actor.default-dispatcher-534] DEBUG o.a.f.runtime.jobmaster.JobMaster - Received heartbeat request from 2ac3bc91e5eb2b7e41baf47d97764652.
18:29:44.071 [flink-akka.actor.default-dispatcher-535] INFO  o.a.f.r.r.StandaloneResourceManager - The heartbeat of JobManager with id 966e2f0da4aa05748d6c7921fb02819e timed out.
18:29:44.071 [flink-akka.actor.default-dispatcher-534] DEBUG o.a.f.r.j.slotpool.SlotPoolImpl - Slot Pool Status:
    status: connected to akka://flink/user/resourcemanager
    registered TaskManagers: [0895b99a-27b3-43da-8226-16031db924de]
    available slots: []
    allocated slots: [[AllocatedSlot 8b473e13eadfa5d9b8e34e9020536b5e @ 0895b99a-27b3-43da-8226-16031db924de @ localhost (dataPort=-1) - 0]]
    pending requests: []
    }

18:29:44.071 [flink-akka.actor.default-dispatcher-535] INFO  o.a.f.r.r.StandaloneResourceManager - Disconnect job manager afeaa9d79956984a8a830ef4007a4315@akka://flink/user/jobmanager_1 for job 6109483ea411f1c099d0bb18a5995742 from the resource manager.
18:29:44.071 [flink-akka.actor.default-dispatcher-535] INFO  o.a.f.r.r.StandaloneResourceManager - The heartbeat of TaskManager with id 0895b99a-27b3-43da-8226-16031db924de timed out.
18:29:44.071 [flink-akka.actor.default-dispatcher-536] DEBUG o.a.f.r.taskexecutor.TaskExecutor - Received heartbeat request from 2ac3bc91e5eb2b7e41baf47d97764652.
18:29:44.071 [flink-akka.actor.default-dispatcher-535] INFO  o.a.f.r.r.StandaloneResourceManager - Closing TaskExecutor connection 0895b99a-27b3-43da-8226-16031db924de because: The heartbeat of TaskManager with id 0895b99a-27b3-43da-8226-16031db924de  timed out.
18:29:44.071 [flink-akka.actor.default-dispatcher-535] DEBUG o.a.f.r.r.s.SlotManagerImpl - Unregister TaskManager 899db08270ab48abff5e3111979c6f07 from the SlotManager.
18:29:44.071 [flink-akka.actor.default-dispatcher-534] DEBUG o.a.f.runtime.jobmaster.JobMaster - Disconnect TaskExecutor 0895b99a-27b3-43da-8226-16031db924de because: Heartbeat of TaskManager with id 0895b99a-27b3-43da-8226-16031db924de timed out.
18:29:44.071 [flink-akka.actor.default-dispatcher-535] DEBUG o.a.f.r.r.StandaloneResourceManager - Received heartbeat from 966e2f0da4aa05748d6c7921fb02819e.
18:29:44.071 [flink-akka.actor.default-dispatcher-535] DEBUG o.a.f.r.r.StandaloneResourceManager - Received heartbeat from 0895b99a-27b3-43da-8226-16031db924de.
18:29:44.071 [flink-akka.actor.default-dispatcher-535] DEBUG o.a.f.r.r.StandaloneResourceManager - Received slot report from TaskManager 0895b99a-27b3-43da-8226-16031db924de which is no longer registered.
18:29:44.071 [flink-akka.actor.default-dispatcher-536] DEBUG o.a.f.r.taskexecutor.TaskExecutor - Close ResourceManager connection 2ac3bc91e5eb2b7e41baf47d97764652.
java.util.concurrent.TimeoutException: The heartbeat of TaskManager with id 0895b99a-27b3-43da-8226-16031db924de  timed out.
    at org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1144)
    at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
    at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
    at akka.actor.Actor.aroundReceive(Actor.scala:517)
    at akka.actor.Actor.aroundReceive$(Actor.scala:515)
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
    at akka.actor.ActorCell.invoke(ActorCell.scala:561)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
18:29:44.072 [flink-akka.actor.default-dispatcher-534] INFO  o.a.f.r.e.ExecutionGraph - Map (1/1) (f6b46e7b79dc5c196c14072ff765354f) switched from RUNNING to FAILED.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 0895b99a-27b3-43da-8226-16031db924de timed out.
    at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1149)
    at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
    at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
    at akka.actor.Actor.aroundReceive(Actor.scala:517)
    at akka.actor.Actor.aroundReceive$(Actor.scala:515)
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
    at akka.actor.ActorCell.invoke(ActorCell.scala:561)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
18:29:44.072 [flink-akka.actor.default-dispatcher-535] DEBUG o.a.f.r.r.StandaloneResourceManager - No open TaskExecutor connection 0895b99a-27b3-43da-8226-16031db924de. Ignoring close TaskExecutor connection. Closing reason was: The heartbeat of TaskManager with id 0895b99a-27b3-43da-8226-16031db924de  timed out.
18:29:44.072 [flink-akka.actor.default-dispatcher-536] INFO  o.a.f.r.taskexecutor.TaskExecutor - Connecting to ResourceManager akka://flink/user/resourcemanager(bcfecd2d9083fd495b6f80b0a30e4507).
18:29:44.072 [flink-akka.actor.default-dispatcher-536] DEBUG o.a.f.r.rpc.akka.AkkaRpcService - Try to connect to remote RPC endpoint with address akka://flink/user/resourcemanager. Returning a org.apache.flink.runtime.resourcemanager.ResourceManagerGateway gateway.
18:29:44.072 [flink-akka.actor.default-dispatcher-534] INFO  o.a.f.r.e.ExecutionGraph - Job Flink Streaming Job (6109483ea411f1c099d0bb18a5995742) switched from state RUNNING to FAILING.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 0895b99a-27b3-43da-8226-16031db924de timed out.
    at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1149)
    at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
    at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
    at akka.actor.Actor.aroundReceive(Actor.scala:517)
    at akka.actor.Actor.aroundReceive$(Actor.scala:515)
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
    at akka.actor.ActorCell.invoke(ActorCell.scala:561)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题