我已尝试从群集中删除所有对象并重新创建它。但问题仍然存在(The node was low on resource: ephemeral-storage
)。请参阅df
显示大量可用空间。microk8s ctr images list
,没有多余的。值得注意的是,df
的输出一度飙升到了系统分区的96%,但在后来的运行中,它并没有持续。
light@o-node0:~/lh-orchestrator$ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 1.2G 1.9M 1.2G 1% /run
/dev/sda2 32G 23G 7.6G 75% /
tmpfs 5.8G 0 5.8G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/7b67cbf0dfd60e763e26be4555ed80ec129c14e902eed464e4831e0ae5df3bf8/shm
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/d688bf0e57d05c8165afb32250e6bc216f6072c3b86e96cad26e3caae4c8d856/shm
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/36ec3d840cf8f3c1bd5b4953fa445c71bceb1f0afa5b13fa24855fd17e99bb1e/shm
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/95546ac44384550e3446059da4f7d24cc7a4bfe21d00e67bd924d6a7bd77b650/shm
tmpfs 1.2G 4.0K 1.2G 1% /run/user/1000
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/0b61b66534868b6d588f6ee0d752a416cf5b4849673dde666d1eaf74088e0057/shm
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/0193379ce5b4c60a5c31a402ba0248dd38691d8f4e44952a7bfa2b5db0738149/shm
shm 64M 16K 64M 1% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/1e0381a9c0c430b5ef54f2f7490addb68524220296b55e61b8ecfbdc2c6fbd0c/shm
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/196481a3394e8b66b363fb78bb1824744a090d67af50fb80929c4fc0be78b7e0/shm
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/bb2448e3cecbc98ba64202f175a8c6a21f144bd750eb3e369a0f74e5c6f2df3b/shm
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/1d6f88440c6e5e88656ddcb4441f1700f9b9df2e1598011d01099a9fa4b043e0/shm
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/787853e894de1df24ec204659a3a519a64e6b51776488ee27624cff8baae862b/shm
shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/7fe5df9a0b9e96503cac42c7bcfb1a27630ff6534ce075e974e35769697d8bc8/shm
light@o-node0:~$ k describe pod aa114-detect-7b4b986856-c22h6
Name: aa114-detect-7b4b986856-c22h6
Namespace: default
Priority: 0
Service Account: default
Node: o-node0/192.168.125.74
Start Time: Thu, 27 Apr 2023 05:43:26 +0000
Labels: app=aa114-detect
pod-template-hash=7b4b986856
Annotations: cni.projectcalico.org/containerID: 1f7cbc2faac67746999cb3380fddf1a47c0b9748800795a8ea72dfe1feb0a1db
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Failed
Reason: Evicted
Message: The node was low on resource: ephemeral-storage. Threshold quantity: 1Gi, available: 0.
IP: 10.1.234.169
IPs:
IP: 10.1.234.169
Controlled By: ReplicaSet/aa114-detect-7b4b986856
Containers:
aa114-detect:
Container ID:
Image: registry.dev.mpksoft.ru/lighthouse/lh-detector/detect-runner-shliakhtin.img
Image ID:
Port: 8554/TCP
Host Port: 0/TCP
Command:
/bin/sh
Args:
-c
/usr/bin/lh-detector --object-detector.camera_url="$(CAMERA_URL)" $(DETECTOR_EXTRA_ARGS)
State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was terminated
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Last State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was deleted. The container used to be Running
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 1
Environment Variables from:
aa114-pipel-config ConfigMap Optional: false
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jvpf8 (ro)
Conditions:
Type Status
DisruptionTarget True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-jvpf8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
1条答案
按热度按时间t1qtbnec1#
doc by Redhat
由于“
The node was low on resource: ephemeral-storage."
”,节点的所有Pod都处于Evicted状态1.在某些情况下,这是因为过多的日志消息正在消耗存储。配置Docker logging driver以限制存储的日志量
2.在其他情况下,使用emptyDir而没有存储配额的pod将填满此存储
3.设置一个配额来限制这一点,否则任何容器都可以向其节点文件系统写入任何数量的存储。
以下步骤源自行业趋势的doc:
退出码:137
如果您的Kubernetes生态系统返回“exited with code 137”,那么您可能会在此系统中遇到内存问题。
导致此错误的原因很少,例如内存限制、过度使用节点和内存泄漏。
修复退出码:137
如果您试图修复退出代码137,请尝试以下三种方法
Kubernetes建议为集群中的每个节点分配大约300 MIB的内存,这应该足以让节点正常工作。
然而,根据Kubernetes生态系统的复杂性,尽可能多的内存总是一个更好的主意。如果您的系统有足够的空间,则指定更大的存储量,以帮助每个节点运行,而无需出现退出代码137。