kubernetes 容器临时存储器

5q4ezhmt  于 2023-04-29  发布在  Kubernetes
关注(0)|答案(1)|浏览(526)

我已尝试从群集中删除所有对象并重新创建它。但问题仍然存在(The node was low on resource: ephemeral-storage)。请参阅df显示大量可用空间。microk8s ctr images list,没有多余的。值得注意的是,df的输出一度飙升到了系统分区的96%,但在后来的运行中,它并没有持续。

light@o-node0:~/lh-orchestrator$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           1.2G  1.9M  1.2G   1% /run
/dev/sda2        32G   23G  7.6G  75% /
tmpfs           5.8G     0  5.8G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/7b67cbf0dfd60e763e26be4555ed80ec129c14e902eed464e4831e0ae5df3bf8/shm
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/d688bf0e57d05c8165afb32250e6bc216f6072c3b86e96cad26e3caae4c8d856/shm
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/36ec3d840cf8f3c1bd5b4953fa445c71bceb1f0afa5b13fa24855fd17e99bb1e/shm
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/95546ac44384550e3446059da4f7d24cc7a4bfe21d00e67bd924d6a7bd77b650/shm
tmpfs           1.2G  4.0K  1.2G   1% /run/user/1000
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/0b61b66534868b6d588f6ee0d752a416cf5b4849673dde666d1eaf74088e0057/shm
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/0193379ce5b4c60a5c31a402ba0248dd38691d8f4e44952a7bfa2b5db0738149/shm
shm              64M   16K   64M   1% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/1e0381a9c0c430b5ef54f2f7490addb68524220296b55e61b8ecfbdc2c6fbd0c/shm
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/196481a3394e8b66b363fb78bb1824744a090d67af50fb80929c4fc0be78b7e0/shm
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/bb2448e3cecbc98ba64202f175a8c6a21f144bd750eb3e369a0f74e5c6f2df3b/shm
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/1d6f88440c6e5e88656ddcb4441f1700f9b9df2e1598011d01099a9fa4b043e0/shm
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/787853e894de1df24ec204659a3a519a64e6b51776488ee27624cff8baae862b/shm
shm              64M     0   64M   0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/7fe5df9a0b9e96503cac42c7bcfb1a27630ff6534ce075e974e35769697d8bc8/shm
light@o-node0:~$ k describe pod aa114-detect-7b4b986856-c22h6
Name:             aa114-detect-7b4b986856-c22h6
Namespace:        default
Priority:         0
Service Account:  default
Node:             o-node0/192.168.125.74
Start Time:       Thu, 27 Apr 2023 05:43:26 +0000
Labels:           app=aa114-detect
                  pod-template-hash=7b4b986856
Annotations:      cni.projectcalico.org/containerID: 1f7cbc2faac67746999cb3380fddf1a47c0b9748800795a8ea72dfe1feb0a1db
                  cni.projectcalico.org/podIP: 
                  cni.projectcalico.org/podIPs: 
Status:           Failed
Reason:           Evicted
Message:          The node was low on resource: ephemeral-storage. Threshold quantity: 1Gi, available: 0. 
IP:               10.1.234.169
IPs:
  IP:           10.1.234.169
Controlled By:  ReplicaSet/aa114-detect-7b4b986856
Containers:
  aa114-detect:
    Container ID:  
    Image:         registry.dev.mpksoft.ru/lighthouse/lh-detector/detect-runner-shliakhtin.img
    Image ID:      
    Port:          8554/TCP
    Host Port:     0/TCP
    Command:
      /bin/sh
    Args:
      -c
      /usr/bin/lh-detector --object-detector.camera_url="$(CAMERA_URL)" $(DETECTOR_EXTRA_ARGS)
    State:          Terminated
      Reason:       ContainerStatusUnknown
      Message:      The container could not be located when the pod was terminated
      Exit Code:    137
      Started:      Mon, 01 Jan 0001 00:00:00 +0000
      Finished:     Mon, 01 Jan 0001 00:00:00 +0000
    Last State:     Terminated
      Reason:       ContainerStatusUnknown
      Message:      The container could not be located when the pod was deleted.  The container used to be Running
      Exit Code:    137
      Started:      Mon, 01 Jan 0001 00:00:00 +0000
      Finished:     Mon, 01 Jan 0001 00:00:00 +0000
    Ready:          False
    Restart Count:  1
    Environment Variables from:
      aa114-pipel-config  ConfigMap  Optional: false
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jvpf8 (ro)
Conditions:
  Type               Status
  DisruptionTarget   True 
  Initialized        True 
  Ready              False 
  ContainersReady    False 
  PodScheduled       True 
Volumes:
  kube-api-access-jvpf8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>
t1qtbnec

t1qtbnec1#

doc by Redhat
由于“The node was low on resource: ephemeral-storage."”,节点的所有Pod都处于Evicted状态
1.在某些情况下,这是因为过多的日志消息正在消耗存储。配置Docker logging driver以限制存储的日志量
2.在其他情况下,使用emptyDir而没有存储配额的pod将填满此存储
3.设置一个配额来限制这一点,否则任何容器都可以向其节点文件系统写入任何数量的存储。
以下步骤源自行业趋势的doc

退出码:137

如果您的Kubernetes生态系统返回“exited with code 137”,那么您可能会在此系统中遇到内存问题。

导致此错误的原因很少,例如内存限制、过度使用节点和内存泄漏。
修复退出码:137

如果您试图修复退出代码137,请尝试以下三种方法

  • 添加额外的pod体积
  • 增加磁盘空间
  • 减少平行跑者

Kubernetes建议为集群中的每个节点分配大约300 MIB的内存,这应该足以让节点正常工作。
然而,根据Kubernetes生态系统的复杂性,尽可能多的内存总是一个更好的主意。如果您的系统有足够的空间,则指定更大的存储量,以帮助每个节点运行,而无需出现退出代码137。

相关问题