我给flink k8s搞错了作业00000000000000未处于运行状态,而是已计划中止检查点

k2arahey  于 2021-06-21  发布在  Flink
关注(0)|答案(1)|浏览(623)

当我把flink工作应用到k8s zookeeper ha时,我得到以下错误。
我们的结构是工作集群。一份工作和一项任务。我们想在执行任务的同时删除pod任务,但仍然可以继续工作。

job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint

下面是我的配置

high-availability: zookeeper
high-availability.storageDir: file:///opt/flink/data/
high-availability.zookeeper.quorum: zk-0.zk-hs:2181,zk-1.zk-hs:2181,zk-2.zk-hs:2181
high-availability.zookeeper.client.acl: open
high-availability.zookeeper.path.root: /flinkha
high-availability.cluster-id: /flink-job-service-kpi-ofcwy

下面是错误日志:

2020-06-19 12:56:02,254 INFO  org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore  - Recovering checkpoints from ZooKeeper.
2020-06-19 12:56:02,293 INFO  org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore  - Found 0 checkpoints in ZooKeeper.
2020-06-19 12:56:02,293 INFO  org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore  - Trying to fetch 0 checkpoints from storage.
2020-06-19 12:56:02,312 INFO  org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/00000000000000000000000000000000/job_manager_lock'}.
2020-06-19 12:56:02,454 INFO  org.apache.flink.runtime.jobmaster.JobManagerRunner           - JobManager runner for job KPI service job (00000000000000000000000000000000) was granted leadership with session id 9644799b-29cf-4ec5-9e68-5e45261aefb2 at akka.tcp://flink@flink-job-service-kpi-ofcwy:35817/user/jobmanager_0.
2020-06-19 12:56:02,532 INFO  org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2020-06-19 12:56:02,534 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Starting execution of job KPI service job (00000000000000000000000000000000) under job master id 9e685e45261aefb29644799b29cf4ec5.
2020-06-19 12:56:02,552 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KPI service job (00000000000000000000000000000000) switched from state CREATED to RUNNING.
2020-06-19 12:56:02,575 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KPI-Kafka-Consumer -> (Sink: Print to Std. Out, Filter -> KPI Query Map -> KPI Unwind -> KPI Custom Map -> KPI filter -> KPI Data Transformation -> Filter) (1/1) (6aeaf74d5a4ee58579e79fa1d3026535) switched from CREATED to SCHEDULED.
2020-06-19 12:56:02,618 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl      - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{4abf5ce93cd365168228b616bd80ed71}]
2020-06-19 12:56:02,634 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Process -> Flat Map (1/1) (4ac2344f71fb9b6beb4a42fe18cf77a2) switched from CREATED to SCHEDULED.
2020-06-19 12:56:02,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Window(TumblingProcessingTimeWindows(60000), ProcessingTimeTrigger, DistinctCountAggregateFunction, PassThroughWindowFunction) -> Map (1/1) (1fbb13647621f5e48db6f7d750c32865) switched from CREATED to SCHEDULED.
2020-06-19 12:56:02,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Flat Map -> (Sink: Unnamed, Sink: Print to Std. Out) (1/1) (46396671fce9498171d03a31b1cee968) switched from CREATED to SCHEDULED.
2020-06-19 12:56:02,655 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Connecting to ResourceManager akka.tcp://flink@flink-job-service-kpi-ofcwy:35817/user/resourcemanager(82039211570997fc83bd52bafb394879)
2020-06-19 12:56:02,674 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Resolved ResourceManager address, beginning registration
2020-06-19 12:56:02,677 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Registration at ResourceManager attempt 1 (timeout=100ms)
2020-06-19 12:56:02,692 INFO  org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService /leader/00000000000000000000000000000000/job_manager_lock.
2020-06-19 12:56:02,693 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Registering job manager 9e685e45261aefb29644799b29cf4ec5@akka.tcp://flink@flink-job-service-kpi-ofcwy:35817/user/jobmanager_0 for job 00000000000000000000000000000000.
2020-06-19 12:56:02,753 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Registered job manager 9e685e45261aefb29644799b29cf4ec5@akka.tcp://flink@flink-job-service-kpi-ofcwy:35817/user/jobmanager_0 for job 00000000000000000000000000000000.
2020-06-19 12:56:02,775 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - JobManager successfully registered at ResourceManager, leader id: 82039211570997fc83bd52bafb394879.
2020-06-19 12:56:02,775 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl      - Requesting new slot [SlotRequestId{4abf5ce93cd365168228b616bd80ed71}] and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager.
2020-06-19 12:56:02,777 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 00000000000000000000000000000000 with allocation id dcc3d3f3537cd3f1032fe47a0aafe577.
2020-06-19 12:56:40,983 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Checkpoint triggering task Source: KPI-Kafka-Consumer -> (Sink: Print to Std. Out, Filter -> KPI Query Map -> KPI Unwind -> KPI Custom Map -> KPI filter -> KPI Data Transformation -> Filter) (1/1) of job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint.
2020-06-19 12:57:40,982 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Checkpoint triggering task Source: KPI-Kafka-Consumer -> (Sink: Print to Std. Out, Filter -> KPI Query Map -> KPI Unwind -> KPI Custom Map -> KPI filter -> KPI Data Transformation -> Filter) (1/1) of job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint.
qzlgjiam

qzlgjiam1#

通过配置服务解决。缺少以下配置。

high-availability.jobmanager.port: 6070

相关问题