K3S多个Kube-系统Pod未以未知状态运行,包括DNS Pod

9rygscc1  于 2022-10-06  发布在  Kubernetes
关注(0)|答案(1)|浏览(305)

**环境信息:**k3s版本:k3s版本v1.24.3+k3s1(990ba0e8)go版本go1.18.1

节点CPU架构、操作系统和版本:5个运行Headless 64位Raspbian的RPI 4,每个RPI 4都具有以下信息:Linux 5.15.56-V8+#1575 SMP Preempt Fri Jul 22 20:31:26 BST 2022 aarch64 GNU/Linux

集群配置:3个节点配置为控制平面,2个节点配置为工作节点

**描述错误:**Pods:coredns-b96499967-ktgtc,local-path-Provisioner-7b7dc8d6f5-5cfds,metrics-server-668d979685-9szb9,traefik-7cd4fcff68-gfmhm,svclb-traefik-aa9f6b38-j27sw状态未知,0/1 Pod Ready。这意味着集群DNS服务不起作用,因此Pod无法解析内部或外部名称

复制步骤:

**预期行为:**重要示例应处于运行状态,状态已知。此外,DNS应该可以工作,这意味着无头服务应该可以工作,POD应该能够解析集群内外的主机名

**实际行为:**DNSPod应该在已知状态下运行,Pod应该能够解析集群内外的主机名,并且无头服务应该能够工作

额外的上下文/日志:

kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}

.:53 {
    errors
    health
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
      pods insecure
      fallthrough in-addr.arpa ip6.arpa
    }
    hosts /etc/coredns/NodeHosts {
      ttl 60
      reload 15s
      fallthrough
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}
import /etc/coredns/custom/*.server

相关示例说明:

kubectl describe  pods --namespace=kube-system
Name:                 coredns-b96499967-ktgtc
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 master0/192.168.0.68
Start Time:           Fri, 05 Aug 2022 16:09:38 +0100
Labels:               k8s-app=kube-dns
                      pod-template-hash=b96499967
Annotations:          <none>
Status:               Running
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-b96499967
Containers:
  coredns:
    Container ID:  containerd://1a83a59275abdb7b783aa06eb56cb1e5367c1ca196598851c2b7d5154c0a4bb9
    Image:         rancher/mirrored-coredns-coredns:1.9.1
    Image ID:      docker.io/rancher/mirrored-coredns-coredns@sha256:35e38f3165a19cb18c65d83334c13d61db6b24905f45640aa8c2d2a6f55ebcb0
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Fri, 05 Aug 2022 19:19:19 +0100
      Finished:     Fri, 05 Aug 2022 19:20:29 +0100
    Ready:          False
    Restart Count:  8
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /etc/coredns/custom from custom-config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zbbxf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  custom-config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns-custom
    Optional:  true
  kube-api-access-zbbxf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              beta.kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age                    From     Message
  ----    ------          ----                   ----     -------
  Normal  SandboxChanged  41d (x419 over 41d)    kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  SandboxChanged  64m (x11421 over 42h)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  SandboxChanged  2m24s (x139 over 32m)  kubelet  Pod sandbox changed, it will be killed and re-created.

Name:                 metrics-server-668d979685-9szb9
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 master0/192.168.0.68
Start Time:           Fri, 05 Aug 2022 16:09:38 +0100
Labels:               k8s-app=metrics-server
                      pod-template-hash=668d979685
Annotations:          <none>
Status:               Running
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/metrics-server-668d979685
Containers:
  metrics-server:
    Container ID:  containerd://cd02643f7d7bc78ea98abdec20558626cfac39f70e1127b2281342dd00905e44
    Image:         rancher/mirrored-metrics-server:v0.5.2
    Image ID:      docker.io/rancher/mirrored-metrics-server@sha256:48ecad4fe641a09fa4459f93c7ad29d4916f6b9cf7e934d548f1d8eff96e2f35
    Port:          4443/TCP
    Host Port:     0/TCP
    Args:
      --cert-dir=/tmp
      --secure-port=4443
      --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
      --kubelet-use-node-status-port
      --metric-resolution=15s
    State:          Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Fri, 05 Aug 2022 19:19:19 +0100
      Finished:     Fri, 05 Aug 2022 19:20:29 +0100
    Ready:          False
    Restart Count:  8
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get https://:https/livez delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get https://:https/readyz delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-djqgk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-djqgk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age                    From     Message
  ----    ------          ----                   ----     -------
  Normal  SandboxChanged  41d (x418 over 41d)    kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  SandboxChanged  64m (x11427 over 42h)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  SandboxChanged  2m27s (x141 over 32m)  kubelet  Pod sandbox changed, it will be killed and re-created.

Name:                 traefik-7cd4fcff68-gfmhm
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 master0/192.168.0.68
Start Time:           Fri, 05 Aug 2022 16:10:43 +0100
Labels:               app.kubernetes.io/instance=traefik
                      app.kubernetes.io/managed-by=Helm
                      app.kubernetes.io/name=traefik
                      helm.sh/chart=traefik-10.19.300
                      pod-template-hash=7cd4fcff68
Annotations:          prometheus.io/path: /metrics
                      prometheus.io/port: 9100
                      prometheus.io/scrape: true
Status:               Running
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/traefik-7cd4fcff68
Containers:
  traefik:
    Container ID:  containerd://779a1596fb204a7577acda97e9fb3f4c5728cf1655071d8e5faad6a8d407d217
    Image:         rancher/mirrored-library-traefik:2.6.2
    Image ID:      docker.io/rancher/mirrored-library-traefik@sha256:ad2226527eea71b7591d5e9dcc0bffd0e71b2235420c34f358de6db6d529561f
    Ports:         9100/TCP, 9000/TCP, 8000/TCP, 8443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      --global.checknewversion
      --global.sendanonymoususage
      --entrypoints.metrics.address=:9100/tcp
      --entrypoints.traefik.address=:9000/tcp
      --entrypoints.web.address=:8000/tcp
      --entrypoints.websecure.address=:8443/tcp
      --api.dashboard=true
      --ping=true
      --metrics.prometheus=true
      --metrics.prometheus.entrypoint=metrics
      --providers.kubernetescrd
      --providers.kubernetesingress
      --providers.kubernetesingress.ingressendpoint.publishedservice=kube-system/traefik
      --entrypoints.websecure.http.tls=true
    State:          Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Fri, 05 Aug 2022 19:19:19 +0100
      Finished:     Fri, 05 Aug 2022 19:20:29 +0100
    Ready:          False
    Restart Count:  8
    Liveness:       http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=3
    Readiness:      http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=1
    Environment:    <none>
    Mounts:
      /data from data (rw)
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jw4qc (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-jw4qc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age                    From     Message
  ----    ------          ----                   ----     -------
  Normal  SandboxChanged  41d (x415 over 41d)    kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  SandboxChanged  64m (x11418 over 42h)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  SandboxChanged  2m30s (x141 over 32m)  kubelet  Pod sandbox changed, it will be killed and re-created.
eqqqjvef

eqqqjvef1#

我找到的解决问题的解决方案--至少目前是这样--是手动重新启动使用部署命令找到的所有Kube-System部署

kubectl get deployments --namespace=kube-system

如果它们都同样没有准备好,可以使用以下命令重新启动

kubectl -n kube-system rollout restart <deployment>

具体地说,核心部署、本地路径调配器、指标服务器和traefik部署都需要重新启动

相关问题