Flink未启动Kubernetes上的TaskManager，作业已达到全局终端状态

jm81lzqq 于 2023-01-28 发布在 Apache

关注(0)|答案(1)|浏览(291)

我已将Flink群集部署到Kubrnetes，但只看到JobManager正在运行。
我在另一个Kubernetes集群上运行了Flink，并使用Flink Operator中的FlinkDeployment创建了保存点。保存点保存正确。然后，我将Flink应用程序部署到新的Kubernetes集群，并在FlinkDeployment中修补了保存点LocationPath。
Flink pod现在记录此错误

│ WARN  org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Ignoring JobGraph submission 'Windchill ESI Post Processing' because the job already reached a globally-terminal state (i.e. FAILED, CANCELED, FINISHED) in a previous execution.
...
│ io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: Unable to update ConfigMapLock 
...
│ Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PUT at: https://10.0.0.1/api/v1/namespaces/post-processing-int2/configmaps/post-processing-cluster-c │
│ onfig-map. Message: Operation cannot be fulfilled on configmaps "post-processing-cluster-config-map": the object has been modified; please apply your changes to the latest version and tr │
│ y again. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=configmaps, name=post-processing-cluster-config-map, retryAfterSeconds=null, u │
│ id=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on configmaps "post-processing-cluster-config-map": the object has been modified; please apply your  │
│ changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, st │
│ atus=Failure, additionalProperties={}).

存在错误中提到的ConfgiMap。
我的问题是现在如何启动一个新的TaskManager？我设置了numberOfTaskSlots: 4。我尝试进入JobManager pod并运行bin/taskmanager.sh start，但这只是启动了pod中的一个进程，我觉得这不正确。然后我停止了它。
我希望看到新的TaskManager Pod启动。谢谢

apache-flink

来源：https://stackoverflow.com/questions/75132589/flink-not-starting-taskmanagers-on-kubernetes-job-reached-global-terminal-state

1条答案

按热度按时间

r8uurelv1#

线索就在日志的第一行

WARN  org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Ignoring JobGraph submission 'Windchill ESI Post Processing' because the job already reached a globally-terminal state (i.e. FAILED, CANCELED, FINISHED) in a previous execution

我的错误从这个命令开始

kubectl patch flinkdeployment/<name-of-flink-deployment> --type=merge -p '{"spec": {"job": {"state": "suspended", "upgradeMode": "savepoint"}}}'

问题出在upgradeMode上。不应将其编辑并保留为last-state。最后一个状态使用HA状态（在我的情况下，是存储在Azure Blob存储中的状态）告知Flink部署从其停止的位置开始。savepoint将使部署处于FINISHED状态，并且不会在部署时启动新的TaskManager。
以下是正确的编辑

kubectl patch flinkdeployment/<name-of-flink-deployment> --type=merge -p '{"spec": {"job": {"state": "suspended"}}}'

赞(0）回复(0）举报 2023-01-28

我来回答

Flink未启动Kubernetes上的TaskManager，作业已达到全局终端状态

1条答案

相关问题

热门标签

最新问答