kubernetes Prefect:用户“system:serviceaccount:hm-prefect:prefet-worker”无法在命名空间“default”的API组“batch”中创建资源“jobs”

dwthyt8l  于 2023-08-03  发布在  Kubernetes
关注(0)|答案(1)|浏览(124)

As Prefect工作池和worker现在在2.11.0之后普遍可用。我正在尝试从“完美代理”切换到“完美工作者”。
我通过以下方式部署了Prefect Server

helm upgrade \
  prefect-server \
  prefect-server \
  --install \
  --repo=https://prefecthq.github.io/prefect-helm \
  --namespace=hm-prefect \
  --create-namespace \
  --values=prefect-server-values.yaml

字符串

prefut-server-values.yaml

server:
  image:
    repository: docker.io/prefecthq/prefect
    prefectTag: 2.11.0-python3.11-kubernetes
  publicApiUrl: https://prefect.mydomain.com/api
helm upgrade \
  prefect-worker \
  prefect-worker \
  --install \
  --repo=https://prefecthq.github.io/prefect-helm \
  --namespace=hm-prefect \
  --create-namespace \
  --values=prefect-worker-values.yaml

的数据

prefut-worker-values.yaml

worker:
  image:
    repository: docker.io/prefecthq/prefect
    prefectTag: 2.11.0-python3.11-kubernetes
  apiConfig: server
  config:
    workPool: hm-kubernetes-pool
  serverApiConfig:
    apiUrl: http://prefect-server.hm-prefect.svc:4200/api
➜ helm list -n hm-prefect
NAME            NAMESPACE   REVISION    UPDATED                                 STATUS      CHART                       APP VERSION
prefect-server  hm-prefect  1           2023-07-31 17:07:50.401888 -0700 PDT    deployed    prefect-server-2023.07.27   2.11.1
prefect-worker  hm-prefect  1           2023-07-31 17:18:57.586027 -0700 PDT    deployed    prefect-worker-2023.07.27   2.11.1

➜ kubectl get deployment -n hm-prefect
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
prefect-server   1/1     1            1           17m
prefect-worker   1/1     1            1           6m2s

我可以在UI中看到Prefect Worker:
x1c 0d1x的数据
然后我通过以下方式生成YAML文件:

➜ prefect deployment build src/main.py:print_platform --name=print-platform --infra-block=kubernetes-job/print-platform-kubernetes-job-block --apply --pool=hm-kubernetes-pool
Found flow 'print-platform'
Deployment YAML created at '/Users/hongbo-miao/Clouds/Git/hongbomiao.com/hm-prefect/workflows/print-platform/print_platform-deployment.yaml'.
Deployment storage None does not have upload capabilities; no files uploaded.  Pass --skip-upload to suppress this warning.
Deployment 'print-platform/print-platform' successfully created with id '7f7603ca-697c-4dca-9bcb-28a889165fe8'.


生成文件print_platform-deployment.yaml内容如下:

###
### A complete description of a Prefect Deployment for flow 'print-platform'
###
name: print-platform
description: null
version: e4da5dae95465f73a0e3e0bece1555bb
# The work queue that will handle this deployment's runs
work_queue_name: default
work_pool_name: hm-kubernetes-pool
tags: []
parameters: {}
schedule: null
is_schedule_active: true
infra_overrides: {}

###
### DO NOT EDIT BELOW THIS LINE
###
flow_name: print-platform
manifest_path: null
infrastructure:
  type: kubernetes-job
  env: {}
  labels: {}
  name: null
  command: null
  image: ghcr.io/hongbo-miao/hm-prefect-print-platform:latest
  namespace: hm-prefect
  service_account_name: null
  image_pull_policy: Always
  cluster_config: null
  job:
    apiVersion: batch/v1
    kind: Job
    metadata:
      labels: {}
    spec:
      template:
        spec:
          parallelism: 1
          completions: 1
          restartPolicy: Never
          containers:
          - name: prefect-job
            env: []
  customizations: []
  job_watch_timeout_seconds: null
  pod_watch_timeout_seconds: 60
  stream_output: true
  finished_job_ttl: null
  _block_document_id: 1f5b585c-581d-4ca4-adfa-c69dc5319941
  _block_document_name: print-platform-kubernetes-job-block
  _is_anonymous: false
  block_type_slug: kubernetes-job
  _block_type_slug: kubernetes-job
storage: null
path: /opt/prefect/flows
entrypoint: src/main.py:print_platform
parameter_openapi_schema:
  title: Parameters
  type: object
  properties: {}
  required: null
  definitions: null
timestamp: '2023-08-01T00:32:45.975410+00:00'
triggers: []


接下来,我试着跑过去

➜ prefect deployment run print-platform/print-platform
Creating flow run for deployment 'print-platform/print-platform'...
Created flow run 'onyx-fennec'.
└── UUID: 065326e7-1d3e-455a-86fb-b15d553af5bd
└── Parameters: {}
└── Scheduled start time: 2023-07-31 17:32:50 PDT (now)
└── URL: https://prefect.mydomain.com/flow-runs/flow-run/065326e7-1d3e-455a-86fb-b15d553af5bd


然而,这给了我错误:

Worker 'KubernetesWorker 180550e0-fe47-4a0d-998d-b772d53e14b0' submitting flow run '065326e7-1d3e-455a-86fb-b15d553af5bd'
Creating Kubernetes job...

Failed to submit flow run '065326e7-1d3e-455a-86fb-b15d553af5bd' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 628, in _create_job
    job = batch_client.create_namespaced_job(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/batch_v1_api.py", line 210, in create_namespaced_job
    return self.create_namespaced_job_with_http_info(namespace, body, **kwargs)  # noqa: E501
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/batch_v1_api.py", line 309, in create_namespaced_job_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 276, in POST
    return self.request("POST", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 235, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '7871421d-254d-4d72-9a30-a7ff3306822b', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'e5d21bfa-f8ff-4689-965a-2c8efc99569b', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'f86dde2c-b36e-4c12-a44c-31e36a8ecf05', 'Date': 'Tue, 01 Aug 2023 00:32:51 GMT', 'Content-Length': '321'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:hm-prefect:prefect-worker\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"default\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/workers/base.py", line 834, in _submit_run_and_capture_errors
    result = await self.run(
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 506, in run
    job = await run_sync_in_worker_thread(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 637, in _create_job
    message += ": " + exc.body["message"]
                      ~~~~~~~~^^^^^^^^^^^
TypeError: string indices must be integers, not 'str'

Completed submission of flow run '065326e7-1d3e-455a-86fb-b15d553af5bd'
Reported flow run '065326e7-1d3e-455a-86fb-b15d553af5bd' as crashed: Flow run could not be submitted to infrastructure


里面好像是这条线的问题
{“kind”:“Status”,“apiVersion”:“v1”,“metadata”:{},“status”:“Failure”,“message”:“jobs.batch is forbidden:用户“system:serviceaccount:hm-prefect:prefet-worker”无法在命名空间“default”",“reason”:“Forbidden”,“details”:{“group”:“batch”,“kind”:“jobs”},“code”:403}的API组“batch”中创建资源“jobs”
我不知道为什么它试图在命名空间default而不是hm-prefect中创建作业。有什么想法吗?谢谢!

更新一:

➜ prefect init
? Would you like to initialize your deployment configuration with a recipe? [Use arrows to move; enter to select; n to select none]
┏━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Name         ┃ Description                                                                                   ┃
┡━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    │ s3           │ Store code within an S3 bucket                                                                │
│ >  │ docker       │ Store code within a custom docker image alongside its runtime environment                     │
│    │ docker-s3    │ Store code within S3 and build a custom docker image for runtime                              │
│    │ docker-azure │ Store code within an Azure Blob Storage container and build a custom docker image for runtime │
│    │ azure        │ Store code within an Azure Blob Storage container                                             │
│    │ docker-gcs   │ Store code within GCS and build a custom docker image for runtime                             │
│    │ docker-git   │ Store code within a git repository and build a custom docker image for runtime                │
│    │ local        │ Store code on a local filesystem                                                              │
│    │ git          │ Store code within git repository                                                              │
│    │ gcs          │ Store code within a GCS bucket                                                                │
└────┴──────────────┴───────────────────────────────────────────────────────────────────────────────────────────────┘
    No, I'll use the default deployment configuration.
                         Required inputs for 'docker' recipe
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Field Name ┃ Description                                                          ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ image_name │ The image name, including repository, to give the built Docker image │
│ tag        │ The tag to give the built Docker image                               │
└────────────┴──────────────────────────────────────────────────────────────────────┘
image_name: ghcr.io/hongbo-miao/hm-prefect-print-platform
tag: latest
---------------
Created project in /Users/hongbo-miao/Clouds/Git/hongbomiao.com/hm-prefect/workflows/print-platform with the following new files:
prefect.yaml


我删除了buildpubsh部分,因为我的Docker镜像已经构建好了。以下是我更新的prefect.yaml::

name: print-platform
prefect-version: 2.11.1
pull:
- prefect.deployments.steps.set_working_directory:
    directory: /opt/prefect/print-platform
deployments:
- name: print-platform
  version: null
  tags: []
  description: null
  schedule: {}
  flow_name: null
  entrypoint: src/main.py:print_platform
  parameters: {}
  work_pool:
    name: hm-kubernetes-pool
    work_queue_name: null
    job_variables:
      image: ghcr.io/hongbo-miao/hm-prefect-print-platform


我希望避免提示,这是我如何部署:

➜ prefect --no-prompt deploy
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Deployment 'print-platform/print-platform' successfully created with id 'e5bb4249-3a9f-4d62-bee2-fc9dce69fbd8'.                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

View Deployment in UI: https://prefect.hongbomiao.com/deployments/deployment/e5bb4249-3a9f-4d62-bee2-fc9dce69fbd8

To execute flow runs from this deployment, start a worker in a separate terminal that pulls work from the 'hm-kubernetes-pool' work pool:

        $ prefect worker start --pool 'hm-kubernetes-pool'

To schedule a run for this deployment, use the following command:

        $ prefect deployment run 'print-platform/print-platform'


接下来我就跑

➜ prefect deployment run print-platform/print-platform
Creating flow run for deployment 'print-platform/print-platform'...
Created flow run 'charming-chimpanzee'.
└── UUID: 1f83d2ee-2584-424e-96ff-11e236ff7f1b
└── Parameters: {}
└── Scheduled start time: 2023-08-01 13:57:20 PDT (now)
└── URL: https://prefect.hongbomiao.com/flow-runs/flow-run/1f83d2ee-2584-424e-96ff-11e236ff7f1b
Worker 'KubernetesWorker 59a0fab6-b9c8-4668-b626-9a5cc0311250' submitting flow run '1f83d2ee-2584-424e-96ff-11e236ff7f1b'
Creating Kubernetes job...
Failed to submit flow run '1f83d2ee-2584-424e-96ff-11e236ff7f1b' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
OSError: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 714, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 403, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1053, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 363, in connect
    self.sock = conn = self._new_conn()
                       ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0xffffac3a3c10>: Failed to establish a new connection: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/workers/base.py", line 834, in _submit_run_and_capture_errors
    result = await self.run(
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 506, in run
    job = await run_sync_in_worker_thread(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 628, in _create_job
    job = batch_client.create_namespaced_job(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/batch_v1_api.py", line 210, in create_namespaced_job
    return self.create_namespaced_job_with_http_info(namespace, body, **kwargs)  # noqa: E501
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/batch_v1_api.py", line 309, in create_namespaced_job_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 276, in POST
    return self.request("POST", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 169, in request
    r = self.pool_manager.request(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/request.py", line 78, in request
    return self.request_encode_body(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/request.py", line 170, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 826, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 826, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 826, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 798, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.43.0.1', port=443): Max retries exceeded with url: /apis/batch/v1/namespaces/default/jobs (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffffac3a3c10>: Failed to establish a new connection: [Errno 113] No route to host'))
Completed submission of flow run '1f83d2ee-2584-424e-96ff-11e236ff7f1b'
Reported flow run '1f83d2ee-2584-424e-96ff-11e236ff7f1b' as crashed: Flow run could not be submitted to infrastructure

这一次的错误有点不同。但是,我还是迷路了。

iqxoj9l9

iqxoj9l91#

这里的问题是你如何创建你的部署-这是一个常见的混淆点,所以我们正在努力使这在文档中更清楚。
TLDR:使用worker时使用prefect deploy而不是prefect deployment build ...
基本问题是,使用prefect deployment build创建的部署被假定为由代理执行,因此不会正确地从工作池继承infra配置。在新部署所指向的k8s工作池上设置namespace,并通过prefect deploy创建这些新部署
例如,在

prefect deploy src/main.py:print_platform -p hm-kubernetes-pool

字符串
使用prefect deploy(没有任何附加标志),交互式向导将在项目中找到流入口点,您可以选择一个,然后根据所需的工作池、是否需要计划等填充部署配置。在向导结束时,您可以保存部署的配置,以便以后在CI或其他非交互式使用中使用
使用Prefect worker和工作池,您可以将有关每个部署的额外信息传递给服务器,例如pull步骤,该步骤在准备流运行(如果需要)时执行任意进程,最常见的是prefect.deployments.steps.git_clone
为了为每个部署定义此信息,您可以在项目目录的根目录中运行prefect init,您将看到prefect.yaml被创建,在那里您可以定义部署沿着步骤。
您可以编辑您的k8s工作池(就定义作业变量而言,它取代了KubernetesJob infra块),然后基于每个部署覆盖值(如imagenamespace
x1c 0d1x的数据

相关问题