Kubernetes无法获取pod/node指标

3vpjnl9f  于 2023-11-17  发布在  Kubernetes
关注(0)|答案(4)|浏览(178)

我已经在kubernetes v1.11.2上安装了metrics-server。
我正在运行一个使用3个节点和1个主节点的裸机集群
在metrics-server日志中,我有以下错误:

E0907 14:29:51.774592       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:vps01: unable to 
fetch metrics from Kubelet vps01 (vps01): Get https://vps01:10250/stats/summary/: dial tcp: lookup vps01 on 10.96.0.10:53: no such host, unable to fully scr
ape metrics from source kubelet_summary:vps04: unable to fetch metrics from Kubelet vps04 (vps04): Get https://vps04:10250/stats/summary/: dial tcp: lookup 
vps04 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vps03: unable to fetch metrics from Kubelet vps03 (vps03): 
Get https://vps03:10250/stats/summary/: dial tcp: lookup vps03 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vp
s02: unable to fetch metrics from Kubelet vps02 (vps02): Get https://vps02:10250/stats/summary/: dial tcp: lookup vps02 on 10.96.0.10:53: no such host]     
E0907 14:30:01.694794       1 reststorage.go:98] unable to fetch pod metrics for pod boxweb/boxweb-deployment-7756c49688-fz625: no metrics known for pod "bo
xweb/boxweb-deployment-7756c49688-fz625"                                                                                                                    
E0907 14:30:10.517886       1 reststorage.go:112] unable to fetch node metrics for node "vps01": no metrics known for node "vps01"

字符串
我也无法使用kubectl top node vps 01获得任何指标
与自动缩放相同,它不起作用

unable to get metrics for resource cpu: unable to fetch metrics from
 resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)

q35jwt9p

q35jwt9p1#

我找到了以下解决方案:
更改metrics-server-deployment.yaml文件并添加:

command:
    - /metrics-server 
    - --kubelet-preferred-address-types=InternalIP
    - --kubelet-insecure-tls

字符串

kq0g1dla

kq0g1dla2#

您的metrics-server pod似乎出现了DNS问题。您可以连接到pod:

kubectl exec -it metrics-server-xxxxxxxxxx-xxxxx -n kube-system sh
/ # ping vps01

字符串
如果你不能ping,你就不能解析你的节点。
core-dns或kube-dns在你的节点上也使用/etc/resolv.conf,所以我会检查你是否可以解析彼此之间的节点。比如,你可以从vps02vps03 ping vps01,等等。

gz5pxeao

gz5pxeao3#

我得到了同样的问题,我通过在每个节点上的/etc/hosts中添加主机名来解决。
为了收集指标数据(CPU/内存使用率),metric服务器尝试访问节点。但是,metric服务器无法解析主机名(vps01vps02vps03vps04),因为这些主机名未在DNS中注册。正如您所提到的,您无法在DNS中注册主机名。
因此,您必须将主机名添加到运行度量服务器POD的节点上的/etc/hosts
自动定标器不工作,因为指标服务器不工作,没有指标数据。

lnvxswe2

lnvxswe24#

修补程序metrics-server部署:

$ kubectl patch -n kube-system deployment metrics-server --type=json \
  -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'

字符串

相关问题