我一直在遇到这个问题,每隔一段时间Knative将无法创建新的部署,并会在几个小时内自发恢复并创建它。在此之前,以下错误一直在服务组件中出现。对我来说,对kubernetes服务的请求超时了,但我不知道为什么。
预期行为
在更新服务时,期望部署新修订版工作。
实际行为
偶尔,在进行有效更改时,ex: changing the value of an annotation
Knative将无法部署新版本,陷入持续尝试协调它的状态长达数小时,然后才能自发恢复。
$ kn revision list -A
NAMESPACE NAME SERVICE TRAFFIC TAGS GENERATION AGE CONDITIONS READY REASON
knative service-00033 service 33 <invalid> 0 OK / 3 Unknown Deploying
knative service-00032 service 100% primary 32 <invalid> 4 OK / 4 True
字符串
在控制器日志中,我在尝试发布到Kubernetes服务IP时看到以下上下文截止日期超过错误:
{
"insertId": "plhs429mzmf9nh5f",
"jsonPayload": {
"logger": "controller.event-broadcaster",
"caller": "record/event.go:285",
"knative.dev/pod": "controller-8c6b99cb7-7zg6n",
"commit": "484e848",
"message": "Event(v1.ObjectReference{Kind:\"Revision\", Namespace:\"knative\", Name:\"service-00033\", UID:\"8a09a3ff-655e-4e5f-b8d4-1a4886ab0678\", APIVersion:\"serving.knative.dev/v1\", ResourceVersion:\"1844291799\", FieldPath:\"\"}): type: 'Warning' reason: 'InternalError' failed to create deployment \"service-api-00033-deployment\": Post \"https://10.123.20.1:443/apis/apps/v1/namespaces/knative/deployments\": context deadline exceeded",
"timestamp": "2023-06-30T09:57:08.7332053Z"
}
型
在它之前,Webhook日志中有以下内容:
{
"insertId": "k078pd2dmx16qrr7",
"jsonPayload": {
"knative.dev/pod": "webhook-d44b476b8-89gbx",
"message": "Failed the resource specific validation",
"knative.dev/operation": "UPDATE",
"logger": "webhook",
"knative.dev/name": "service",
"knative.dev/subresource": "",
"knative.dev/namespace": "knative",
"knative.dev/kind": "serving.knative.dev/v1, Kind=Service",
"knative.dev/resource": "serving.knative.dev/v1, Resource=services",
"commit": "484e848",
"knative.dev/userinfo": "system:serviceaccount:service:default",
"timestamp": "2023-06-30T09:56:38.327880939Z",
"caller": "validation/validation_admit.go:183",
"stacktrace": "knative.dev/pkg/webhook/resourcesemantics/validation.validate\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/webhook/resourcesemantics/validation/validation_admit.go:183\nknative.dev/pkg/webhook/resourcesemantics/validation.(*reconciler).Admit\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/webhook/resourcesemantics/validation/validation_admit.go:79\nknative.dev/pkg/webhook.admissionHandler.func1\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/webhook/admission.go:123\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2109\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2487\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/webhook/webhook.go:263\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/network/handlers/drain.go:113\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2947\nnet/http.(*conn).serve\n\tnet/http/server.go:1991"
}
型
完全不知所措。
重现问题步骤
不详
1条答案
按热度按时间t9aqgxwy1#
我还没有看过你的服务yaml,但我有一个假设,这可能与缓慢的tag to digest resolution。您可以尝试以下操作:
1.监视注册表操作的延迟,特别是
GET
操作。1.引用图像时使用图像摘要。它们看起来像
@sha256:...
而不是:latest
,并确保映像在部署后不会更改。1.禁用标记以摘要解析。请注意,如果移动引用的标记,这可能会导致不可预测的行为。一些示例可以拾取新图像,而其他示例可以使用较早的图像。
如果这是标记以摘要解析,并且您使用的是公共Dockerhub镜像,则向运行Knative Service的服务帐户添加pull凭据可能会给予您更高的速率限制。