kubernetes Knative间歇性地无法创建部署

tktrz96b  于 2023-08-03  发布在  Kubernetes
关注(0)|答案(1)|浏览(110)

我一直在遇到这个问题,每隔一段时间Knative将无法创建新的部署,并会在几个小时内自发恢复并创建它。在此之前,以下错误一直在服务组件中出现。对我来说,对kubernetes服务的请求超时了,但我不知道为什么。

预期行为

在更新服务时,期望部署新修订版工作。

实际行为

偶尔,在进行有效更改时,ex: changing the value of an annotation Knative将无法部署新版本,陷入持续尝试协调它的状态长达数小时,然后才能自发恢复。

$ kn revision list -A
NAMESPACE        NAME                       SERVICE              TRAFFIC   TAGS      GENERATION   AGE         CONDITIONS   READY     REASON
knative       service-00033                  service                                 33           <invalid>   0 OK / 3     Unknown   Deploying
knative       service-00032                  service             100%      primary   32           <invalid>   4 OK / 4     True

字符串
在控制器日志中,我在尝试发布到Kubernetes服务IP时看到以下上下文截止日期超过错误:

{
  "insertId": "plhs429mzmf9nh5f",
  "jsonPayload": {
    "logger": "controller.event-broadcaster",
    "caller": "record/event.go:285",
    "knative.dev/pod": "controller-8c6b99cb7-7zg6n",
    "commit": "484e848",
    "message": "Event(v1.ObjectReference{Kind:\"Revision\", Namespace:\"knative\", Name:\"service-00033\", UID:\"8a09a3ff-655e-4e5f-b8d4-1a4886ab0678\", APIVersion:\"serving.knative.dev/v1\", ResourceVersion:\"1844291799\", FieldPath:\"\"}): type: 'Warning' reason: 'InternalError' failed to create deployment \"service-api-00033-deployment\": Post \"https://10.123.20.1:443/apis/apps/v1/namespaces/knative/deployments\": context deadline exceeded",
    "timestamp": "2023-06-30T09:57:08.7332053Z"
  }


在它之前,Webhook日志中有以下内容:

{
  "insertId": "k078pd2dmx16qrr7",
  "jsonPayload": {
    "knative.dev/pod": "webhook-d44b476b8-89gbx",
    "message": "Failed the resource specific validation",
    "knative.dev/operation": "UPDATE",
    "logger": "webhook",
    "knative.dev/name": "service",
    "knative.dev/subresource": "",
    "knative.dev/namespace": "knative",
    "knative.dev/kind": "serving.knative.dev/v1, Kind=Service",
    "knative.dev/resource": "serving.knative.dev/v1, Resource=services",
    "commit": "484e848",
    "knative.dev/userinfo": "system:serviceaccount:service:default",
    "timestamp": "2023-06-30T09:56:38.327880939Z",
    "caller": "validation/validation_admit.go:183",
    "stacktrace": "knative.dev/pkg/webhook/resourcesemantics/validation.validate\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/webhook/resourcesemantics/validation/validation_admit.go:183\nknative.dev/pkg/webhook/resourcesemantics/validation.(*reconciler).Admit\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/webhook/resourcesemantics/validation/validation_admit.go:79\nknative.dev/pkg/webhook.admissionHandler.func1\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/webhook/admission.go:123\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2109\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2487\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/webhook/webhook.go:263\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20230117181655-247510c00e9d/network/handlers/drain.go:113\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2947\nnet/http.(*conn).serve\n\tnet/http/server.go:1991"
  }


完全不知所措。

重现问题步骤

不详

t9aqgxwy

t9aqgxwy1#

我还没有看过你的服务yaml,但我有一个假设,这可能与缓慢的tag to digest resolution。您可以尝试以下操作:
1.监视注册表操作的延迟,特别是GET操作。
1.引用图像时使用图像摘要。它们看起来像@sha256:...而不是:latest,并确保映像在部署后不会更改。
1.禁用标记以摘要解析。请注意,如果移动引用的标记,这可能会导致不可预测的行为。一些示例可以拾取新图像,而其他示例可以使用较早的图像。
如果这是标记以摘要解析,并且您使用的是公共Dockerhub镜像,则向运行Knative Service的服务帐户添加pull凭据可能会给予您更高的速率限制。

相关问题