Kubernetes:经常无法与kublet API通信(连接被拒绝)[已关闭]

mwkjh3gx  于 2023-06-05  发布在  Kubernetes
关注(0)|答案(1)|浏览(620)

**已关闭。**此问题为not about programming or software development。目前不接受答复。

这个问题似乎不是关于a specific programming problem, a software algorithm, or software tools primarily used by programmers的。如果你认为这个问题与another Stack Exchange site的主题有关,你可以留下评论,解释在哪里可以回答这个问题。
2天前关闭。
Improve this question
我正在单个节点上部署一个新的kubernetes集群(Ubuntu 22.04)
问题是我在运行任何kubectl命令时经常遇到这个错误(主机名更改)
The connection to the server k8cluster.example.com:6443 was refused - did you specify the right host or port?
在我安装kubernetes(通过apt install -y kubelet kubeadm kubectl)之后,一切都很稳定,但显然节点没有处于就绪状态。当我部署Flannel容器网络时,问题就开始了,我是这样做的:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kube-system名称空间中的Pod频繁重启

root@k8cluster:~/.ssh# kubectl get all -A
NAMESPACE      NAME                                                 READY   STATUS             RESTARTS         AGE
kube-flannel   pod/kube-flannel-ds-6h6zq                            1/1     Running            25 (46s ago)     98m
kube-system    pod/coredns-5d78c9869d-gmdpv                         0/1     CrashLoopBackOff   18 (4m40s ago)   130m
kube-system    pod/coredns-5d78c9869d-zhvxk                         1/1     Running            19 (14m ago)     130m
kube-system    pod/etcd-k8cluster.example.com                       1/1     Running            31 (7m21s ago)   130m
kube-system    pod/kube-apiserver-k8cluster.example.com            1/1     Running            37 (5m40s ago)   131m
kube-system    pod/kube-controller-manager-k8cluster.example.com    0/1     Running            46 (5m10s ago)   130m
kube-system    pod/kube-proxy-nvnkf                                 0/1     CrashLoopBackOff   41 (100s ago)    130m
kube-system    pod/kube-scheduler-k8cluster.example.com             0/1     CrashLoopBackOff   44 (4m43s ago)   129m

NAMESPACE     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  132m
kube-system   service/kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   132m

NAMESPACE      NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-flannel   daemonset.apps/kube-flannel-ds   1         1         1       1            1           <none>                   98m
kube-system    daemonset.apps/kube-proxy        1         1         1       1            1           kubernetes.io/os=linux   132m

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   2/2     2            2           132m

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-5d78c9869d   2         2         2       130m

我在运行journalctl -u kubelet时看到这些错误

Jun 02 13:16:21 k8cluster.example.com kubelet[19340]: I0602 13:16:21.848785   19340 scope.go:115] "RemoveContainer" containerID="4da5cc966a4dcf61001cbdbad36c47917fdfeb05bd7c4c985b2f362efa92f464"
Jun 02 13:16:21 k8cluster.example.com kubelet[19340]: I0602 13:16:21.849006   19340 status_manager.go:809] "Failed to get status for pod" podUID=aae126ec9b57a8789f7682f92e81bd7a pod="kube-system/etcd-k8cluster.example.com" err="Get \"https://k8cluster.example.com:6443/api/v1/namespaces/kube-system/pods/etcd-k8cluster.example.com\": dial tcp 172.31.37.108:6443: connect: connection refused"
Jun 02 13:16:21 k8cluster.example.com kubelet[19340]: E0602 13:16:21.849262   19340 pod_workers.go:1294] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 2m40s restarting failed container=kube-apiserver pod=kube-apiserver-spotcluster.infdev.org_kube-system(ccdffaba21456689fa71a8f7b182fb0c)\"" pod="kube-system/kube-apiserver-k8cluster.example.com" podUID=ccdffaba21456689fa71a8f7b182fb0c
Jun 02 13:16:21 k8cluster.example.com kubelet[19340]: I0602 13:16:21.849317   19340 status_manager.go:809] "Failed to get status for pod" podUID=ccdffaba21456689fa71a8f7b182fb0c pod="kube-system/kube-apiserver-k8cluster.example.com" err="Get \"https://k8cluster.example.com:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-k8cluster.example.com\": dial tcp 172.31.37.108:6443: connect: connection refused"
Jun 02 13:16:21 k8cluster.example.com kubelet[19340]: I0602 13:16:21.866932   19340 scope.go:115] "RemoveContainer" containerID="46f9e127efbd2506f390486c2590232e76b0617561c7c440d94c470a4164448f"
Jun 02 13:16:21 k8cluster.example.com kubelet[19340]: E0602 13:16:21.867259   19340 pod_workers.go:1294] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"coredns\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=coredns pod=coredns-5d78c9869d-gmdpv_kube-system(ddf0658a-260b-41d1-a0a0-595de4991ec6)\"" pod="kube-system/coredns-5d78c9869d-gmdpv" podUID=ddf0658a-260b-41d1-a0a0-595de4991ec6
Jun 02 13:16:22 k8cluster.example.com kubelet[19340]: I0602 13:16:22.850577   19340 scope.go:115] "RemoveContainer" containerID="4da5cc966a4dcf61001cbdbad36c47917fdfeb05bd7c4c985b2f362efa92f464"

此外,dmesg还显示以下消息:

[Fri Jun  2 13:02:11 2023] IPv6: ADDRCONF(NETDEV_CHANGE): veth11eea1b5: link becomes ready
[Fri Jun  2 13:02:11 2023] cni0: port 1(veth11eea1b5) entered blocking state
[Fri Jun  2 13:02:11 2023] cni0: port 1(veth11eea1b5) entered forwarding state
[Fri Jun  2 13:11:54 2023] cni0: port 2(veth92694dfb) entered disabled state
[Fri Jun  2 13:11:54 2023] device veth92694dfb left promiscuous mode
[Fri Jun  2 13:11:54 2023] cni0: port 2(veth92694dfb) entered disabled state
[Fri Jun  2 13:11:55 2023] cni0: port 2(veth29e5e0d3) entered blocking state
[Fri Jun  2 13:11:55 2023] cni0: port 2(veth29e5e0d3) entered disabled state
[Fri Jun  2 13:11:55 2023] device veth29e5e0d3 entered promiscuous mode
[Fri Jun  2 13:11:55 2023] cni0: port 2(veth29e5e0d3) entered blocking state
[Fri Jun  2 13:11:55 2023] cni0: port 2(veth29e5e0d3) entered forwarding state
[Fri Jun  2 13:11:55 2023] IPv6: ADDRCONF(NETDEV_CHANGE): veth29e5e0d3: link becomes ready
[Fri Jun  2 13:13:19 2023] cni0: port 1(veth11eea1b5) entered disabled state
[Fri Jun  2 13:13:19 2023] device veth11eea1b5 left promiscuous mode
[Fri Jun  2 13:13:19 2023] cni0: port 1(veth11eea1b5) entered disabled state
[Fri Jun  2 13:13:20 2023] cni0: port 1(veth1f7fb9e0) entered blocking state
[Fri Jun  2 13:13:20 2023] cni0: port 1(veth1f7fb9e0) entered disabled state
[Fri Jun  2 13:13:20 2023] device veth1f7fb9e0 entered promiscuous mode
[Fri Jun  2 13:13:20 2023] cni0: port 1(veth1f7fb9e0) entered blocking state
[Fri Jun  2 13:13:20 2023] cni0: port 1(veth1f7fb9e0) entered forwarding state
[Fri Jun  2 13:13:20 2023] IPv6: ADDRCONF(NETDEV_CHANGE): veth1f7fb9e0: link becomes ready

如果我查看kube-apiserver pod的日志,我会看到这种情况在重演。

Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"
W0602 13:21:03.884015       1 logging.go:59] [core] [Channel #148 SubChannel #149] grpc: addrConn.createTransport failed to connect to {
  "Addr": "127.0.0.1:2379",
  "ServerName": "127.0.0.1",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null

有什么想法吗

zyfwsgd6

zyfwsgd61#

看来我有同样的问题,在这个问题中提到的
Unable to bring up kubernetes API server
这里的解决方案对我很有效

containerd config default | tee /etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml  
service containerd restart
service kubelet restart

相关问题