kubernetes Pod故障和所有正常事件-如何深入了解

t1qtbnec  于 2023-03-01  发布在  Kubernetes
关注(0)|答案(1)|浏览(144)
    • 问题**

我正在尝试部署一个pod,但由于一个我无法理解的错误而失败。该pod通过Airflow运行以执行特定任务。Airflow显示该pod失败,没有任何日志。当我运行kubectl describe pod my-pod时,我得到以下输出。

  • 我应该如何确定问题的根本原因?*

失败的容器部分:

base:
    Container ID:  <ID>
    Image:         <IMAGE>
    Image ID:      <ID>
    Port:          <none>
    Host Port:     <none>
    Command:
      airflow
      run
      /var/airflow/my_dag_name.py
      task_name
      2023-02-20T23:15:00+00:00
      --local
      --pool
      default_pool
      -sd
      /var/airflow/my_dag_name.py
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 20 Feb 2023 20:55:07 -0600
      Finished:     Mon, 20 Feb 2023 20:55:11 -0600
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:                1
      ephemeral-storage:  100Gi
      memory:             8Gi
    Requests:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             8Gi
    Environment:
      <ENV VARS>
    Mounts:
      <VARIOUS MOUNTS>

事件部分(已完成):

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  58s   default-scheduler  Successfully assigned <TASK> to <IP>
  Normal  Pulled     58s   kubelet            Container image <SIDECAR IMAGE 1> already present on machine
  Normal  Created    57s   kubelet            Created container <SIDECAR CONTAINER 1>
  Normal  Started    57s   kubelet            Started container <SIDECAR CONTAINER 1>
  Normal  Pulling    54s   kubelet            Pulling image <SIDECAR IMAGE 2>
  Normal  Pulled     53s   kubelet            Successfully pulled image <SIDECAR IMAGE 2> in 125.691281ms
  Normal  Created    53s   kubelet            Created container <SIDECAR CONTAINER 2>
  Normal  Started    53s   kubelet            Started container <SIDECAR CONTAINER 2>
  Normal  Pulled     52s   kubelet            Container image <FAILING POD IMAGE> already present on machine
  Normal  Created    52s   kubelet            Created container <FAILING POD CONTAINER>
  Normal  Started    52s   kubelet            Started container <FAILING POD CONTAINER>
  Normal  Pulled     52s   kubelet            Container image <SIDECAR IMAGE 3> already present on machine
  Normal  Created    52s   kubelet            Created container <SIDECAR CONTAINER 3>
  Normal  Started    52s   kubelet            Started container <SIDECAR CONTAINER 3>
  Normal  Pulled     52s   kubelet            Container image <SIDECAR IMAGE 4> already present on machine
  Normal  Created    52s   kubelet            Created container <SIDECAR CONTAINER 4>
  Normal  Started    51s   kubelet            Started container <SIDECAR CONTAINER 4>
    • 背景**

pod使用这些临时边车连接到系统/注入信息/等。

hc2pp10m

hc2pp10m1#

在Kubernetes中,容器退出代码对于诊断pod问题非常有用。如果pod不健康,可以使用以下命令查找问题

kubectl describe pod [POD_NAME]

您已经提供了它的输出,其中显示了以下信息:

State: Terminated 
Reason: Error 
Exit Code: 1

由于容器以退出代码1终止,因此需要对容器及其应用程序进行彻底的调查,因为这主要是由于应用程序错误或无效引用造成的。
作为Harsh Manvar建议的第一步,请使用以下命令检索pod中第一个容器的日志,以检查相关pod的日志。

kubectl logs <pod-name> -p

-p代表-previous,表示如果Pod已重新启动,它将返回Pod上一个示例的日志。
日志将显示退出代码1的根本原因,此信息可用于修复pod的YAML文件中的命令字段。更新后,请使用kubectl apply命令将其重新应用到群集。
上述信息来源于James步行者编写的link

相关问题