哪些工作正在失败?
NodeProblemDetector [节点特性:NodeProblemDetector] 应该在没有错误的情况下运行
哪些测试正在失败?
ci-kubernetes-e2e-gci-gce-alpha-enabled-default
自何时以来一直在失败?
很长时间了
Testgrid链接
https://testgrid.k8s.io/google-gce#gci-gce-alpha-enabled-default
失败原因(如果可能)
失败 [失败] 服务器上出现错误 ("Internal Error: failed to list pod stats: rpc error: code = Unknown desc = 1 error occurred:
\t* failed to decode sandbox container metrics for sandbox "2dda500c81c49c73488ad52cb6a19563ac33ccd365b01ea328a1d6381c226398": ttrpc: closed: unknown") 已阻止请求成功 (获取节点 bootstrap-e2e-minion-group-vgh1:10250)
在 k8s.io/kubernetes/test/e2e/node/node_problem_detector.go:381 @ 07/16/24 21:09:33.122
}
STEP: Gather node-problem-detector cpu and memory stats - k8s.io/kubernetes/test/e2e/node/node_problem_detector.go:164 @ 07/16/24 21:06:26.828
I0716 21:09:33.122599 10712 node_problem_detector.go:381] Unexpected error:
<*errors.StatusError | 0xc0003c1e00>:
an error on the server ("Internal Error: failed to list pod stats: rpc error: code = Unknown desc = 1 error occurred:\n\t* failed to decode sandbox container metrics for sandbox \"2dda500c81c49c73488ad52cb6a19563ac33ccd365b01ea328a1d6381c226398\": ttrpc: closed: unknown") has prevented the request from succeeding (get nodes bootstrap-e2e-minion-group-vgh1:10250)
{
ErrStatus:
code: 500
details:
causes:
- message: "Internal Error: failed to list pod stats: rpc error: code = Unknown
desc = 1 error occurred:\n\t* failed to decode sandbox container metrics for
sandbox \"2dda500c81c49c73488ad52cb6a19563ac33ccd365b01ea328a1d6381c226398\":
ttrpc: closed: unknown"
reason: UnexpectedServerResponse
kind: nodes
name: bootstrap-e2e-minion-group-vgh1:10250
message: 'an error on the server ("Internal Error: failed to list pod stats: rpc error:
code = Unknown desc = 1 error occurred:\n\t* failed to decode sandbox container
metrics for sandbox \"2dda500c81c49c73488ad52cb6a19563ac33ccd365b01ea328a1d6381c226398\":
ttrpc: closed: unknown") has prevented the request from succeeding (get nodes bootstrap-e2e-minion-group-vgh1:10250)'
metadata: {}
reason: InternalError
status: Failure,
}
[FAILED] an error on the server ("Internal Error: failed to list pod stats: rpc error: code = Unknown desc = 1 error occurred:\n\t* failed to decode sandbox container metrics for sandbox \"2dda500c81c49c73488ad52cb6a19563ac33ccd365b01ea328a1d6381c226398\": ttrpc: closed: unknown") has prevented the request from succeeding (get nodes bootstrap-e2e-minion-group-vgh1:10250)
In [It] at: k8s.io/kubernetes/test/e2e/node/node_problem_detector.go:381 @ 07/16/24 21:09:33.122
< Exit [It] should run without error - k8s.io/kubernetes/test/e2e/node/node_problem_detector.go:63 @ 07/16/24 21:09:33.122 (3m10.139s)
我们还需要了解其他什么吗?
类似的问题 #122118
相关的SIG(s)
/sig node
4条答案
按热度按时间ozxc1zmp1#
这个问题目前正在等待分类。
如果SIG或子项目确定这是一个相关的问题,他们将通过应用
triage/accepted
标签并提供进一步的指导来接受它。组织成员可以通过在评论中写入
/triage accepted
来添加triage/accepted
标签。有关使用PR评论与我互动的说明,请查看here。如果您对我的行为有任何问题或建议,请针对kubernetes-sigs/prow仓库提出一个问题。
cetgtptt2#
/cc @humblec@SergeyKanzhelev@wangzhen127
mbzjlibv3#
NPD特定的CI测试正在正常运行:https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-e2e-kubernetes-gce-gci,它以系统守护进程的形式运行NPD。
gci-gce-alpha-enabled-default测试是否以daemonset的形式运行NPD?@hakman,你能帮忙看一下吗?
/cc @AnishShah@DigitalVeer
aydmsdu94#
Kubernetes集群中的节点问题检测器(Node Problem Detector)的配置文件位于
kubernetes/cluster/addons/node-problem-detector/npd.yaml
,在第26行到第48行之间。这是一个守护进程集(DaemonSet),用于在Kubernetes集群中检测节点问题。以下是该配置文件的内容:
根据配置文件,这个守护进程集使用
registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.19
镜像来运行。它的作用是在Kubernetes集群中检测节点问题,并将检测结果添加到相应的标签上。