Kubernetes kube控制器管理器和kube调度程序持续重启。以下是Pod日志。
~$ kubectl logs -n kube-system kube-scheduler-node1 -p
I1228 16:59:26.709076 1 serving.go:319] Generated self-signed cert in-memory
I1228 16:59:27.072726 1 server.go:143] Version: v1.16.0
I1228 16:59:27.072806 1 defaults.go:91] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
W1228 16:59:27.075087 1 authorization.go:47] Authorization is disabled
W1228 16:59:27.075103 1 authentication.go:79] Authentication is disabled
I1228 16:59:27.075117 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I1228 16:59:27.075623 1 secure_serving.go:123] Serving securely on [::]:10259
I1228 16:59:28.077293 1 leaderelection.go:241] attempting to acquire leader lease kube-system/kube-scheduler...
E1228 16:59:45.353862 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-scheduler: Get https://IPaddress/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I1228 16:59:47.969930 1 leaderelection.go:251] successfully acquired lease kube-system/kube-scheduler
I1228 17:00:42.008006 1 leaderelection.go:287] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
F1228 17:00:42.008059 1 server.go:264] leaderelection lost
:~$ kubectl logs -n kube-system kube-controller-manager-node1 -p
W1228 17:00:04.721378 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="node4" does not exist
I1228 17:00:04.726825 1 shared_informer.go:204] Caches are synced for certificate
I1228 17:00:04.732538 1 shared_informer.go:204] Caches are synced for TTL
I1228 17:00:04.739613 1 shared_informer.go:204] Caches are synced for ClusterRoleAggregator
I1228 17:00:04.754683 1 shared_informer.go:204] Caches are synced for certificate
I1228 17:00:04.760101 1 shared_informer.go:204] Caches are synced for stateful set
I1228 17:00:04.768974 1 shared_informer.go:204] Caches are synced for namespace
I1228 17:00:04.769914 1 shared_informer.go:204] Caches are synced for deployment
I1228 17:00:04.790541 1 shared_informer.go:204] Caches are synced for daemon sets
I1228 17:00:04.790710 1 shared_informer.go:204] Caches are synced for ReplicationController
I1228 17:00:04.796386 1 shared_informer.go:204] Caches are synced for disruption
I1228 17:00:04.796403 1 disruption.go:341] Sending events to api server.
I1228 17:00:04.804131 1 shared_informer.go:204] Caches are synced for ReplicaSet
I1228 17:00:04.806910 1 shared_informer.go:204] Caches are synced for GC
I1228 17:00:04.809821 1 shared_informer.go:204] Caches are synced for taint
I1228 17:00:04.809909 1 node_lifecycle_controller.go:1208] Initializing eviction metric for zone:
W1228 17:00:04.809999 1 node_lifecycle_controller.go:903] Missing timestamp for Node node3. Assuming now as a timestamp.
W1228 17:00:04.810038 1 node_lifecycle_controller.go:903] Missing timestamp for Node node4. Assuming now as a timestamp.
W1228 17:00:04.810065 1 node_lifecycle_controller.go:903] Missing timestamp for Node node1. Assuming now as a timestamp.
W1228 17:00:04.810086 1 node_lifecycle_controller.go:903] Missing timestamp for Node node2. Assuming now as a timestamp.
I1228 17:00:04.810101 1 node_lifecycle_controller.go:1108] Controller detected that zone is now in state Normal.
I1228 17:00:04.810145 1 event.go:255] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"node2", UID:"68d34fcf-fd86-42a5-9833-57108c93baee", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node node2 event: Registered Node node2 in Controller
I1228 17:00:04.810164 1 taint_manager.go:186] Starting NoExecuteTaintManager
I1228 17:00:04.810224 1 event.go:255] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"node3", UID:"dc80b75f-ce55-4247-84e3-bf0474ac1057", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node node3 event: Registered Node node3 in Controller
I1228 17:00:04.810233 1 event.go:255] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"node4", UID:"c9d859df-795e-4b2a-9def-08efc67ba4e3", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node node4 event: Registered Node node4 in Controller
I1228 17:00:04.810242 1 event.go:255] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"node1", UID:"8bfe45c3-2ce7-4013-a11f-c1ac052e9e00", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node node1 event: Registered Node node1 in Controller
I1228 17:00:04.811241 1 shared_informer.go:204] Caches are synced for node
I1228 17:00:04.811367 1 range_allocator.go:172] Starting range CIDR allocator
I1228 17:00:04.811381 1 shared_informer.go:197] Waiting for caches to sync for cidrallocator
I1228 17:00:04.859423 1 shared_informer.go:204] Caches are synced for HPA
I1228 17:00:04.911545 1 shared_informer.go:204] Caches are synced for cidrallocator
I1228 17:00:04.997853 1 shared_informer.go:204] Caches are synced for bootstrap_signer
I1228 17:00:05.023218 1 shared_informer.go:204] Caches are synced for expand
I1228 17:00:05.030277 1 shared_informer.go:204] Caches are synced for PV protection
I1228 17:00:05.059763 1 shared_informer.go:204] Caches are synced for endpoint
I1228 17:00:05.060705 1 shared_informer.go:204] Caches are synced for persistent volume
I1228 17:00:05.118184 1 shared_informer.go:204] Caches are synced for attach detach
I1228 17:00:05.246897 1 shared_informer.go:204] Caches are synced for job
I1228 17:00:05.248850 1 shared_informer.go:204] Caches are synced for resource quota
I1228 17:00:05.257547 1 shared_informer.go:204] Caches are synced for garbage collector
I1228 17:00:05.257566 1 garbagecollector.go:139] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I1228 17:00:05.260287 1 shared_informer.go:204] Caches are synced for resource quota
I1228 17:00:05.305093 1 shared_informer.go:204] Caches are synced for garbage collector
I1228 17:00:44.906594 1 leaderelection.go:287] failed to renew lease kube-system/kube-controller-manager: failed to tryAcquireOrRenew context deadline exceeded
F1228 17:00:44.906687 1 controllermanager.go:279] leaderelection lost
2条答案
按热度按时间6rqinv9w1#
当您遇到资源紧缩或网络问题时会出现此问题。在我的情况下,由于Kube API服务器遇到资源紧缩,领导者选举API调用超时,这增加了API调用的延迟。
K8S API服务器日志:
cqoc49vn2#
在我的例子中,这是一个网络问题,修复方法是在kube-controller-manager.yaml清单中增加leader-elect-lease-duration和leader-elect-renewal-deadline。
我把它分别增加到120 s和60 s来检查是否有帮助。
请确保租赁期限大于续订期限。