I want recognizes and uses GPUs in Kubernetes on servers/PCs equipped with Nvidia GPUs.
So I try to enabling GPU support in Kubernetes from following NVIDIA device plugin for Kubernetes
, but daemonset is not working.
things done
- install cri-dockerd
confirm:$ cri-dockerd --version
cri-dockerd 0.2.6 (d8accf7)
confirm: $ systemctl status cri-docker.socket
● cri-docker.socket - CRI Docker Socket for the API
Loaded: loaded (/etc/systemd/system/cri-docker.socket; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-12-05 15:00:35 KST; 18h ago
Triggers: ● cri-docker.service
Listen: /run/cri-dockerd.sock (Stream)
Tasks: 0 (limit: 18968)
Memory: 4.0K
CGroup: /system.slice/cri-docker.socket
12월 05 15:00:35 hibernation systemd[1]: Starting CRI Docker Socket for the API.
12월 05 15:00:35 hibernation systemd[1]: Listening on CRI Docker Socket for the API.
- install nvidia docker confirm:
$ sudo docker run --rm --gpus all nvidia/cuda:11.3.1-base-ubuntu20.04 nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:07:00.0 Off | N/A |
| 0% 28C P8 14W / 180W | 64MiB / 12052MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
confirm /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
confirm /etc/containerd/config.toml
# disabled_plugins = ["cri"] # to annotated
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
- install kubectl=1.22.13-00 kubelet=1.22.13-00 kubeadm=1.22.13-00
confirm:$ kubectl version --client && kubeadm version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.13", GitCommit:"a43c0904d0de10f92aa3956c74489c45e6453d6e", GitTreeState:"clean", BuildDate:"2022-08-17T18:28:56Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.13", GitCommit:"a43c0904d0de10f92aa3956c74489c45e6453d6e", GitTreeState:"clean", BuildDate:"2022-08-17T18:27:51Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
confirm: $ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Mon 2022-12-05 15:11:30 KST; 18h ago
Docs: https://kubernetes.io/docs/home/
- master node init
$ sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address 192.168.219.100\
--cri-socket /run/cri-dockerd.sock
confirm: $ kubectl get nodes
NAME STATUS ROLES AGE VERSION
hibernation Ready control-plane,master 3m21s v1.22.13
What I try
Enabling GPU support in Kubernetes
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.13.0/nvidia-device-plugin.yml
confirm GPU support.: $ kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
I expect GPU have 1
, but get this.
NAME GPU
hibernation <none>
This command print nothing. : $ kubectl get pod -A | grep nvidia
confirm: $ kubectl describe daemonset nvidia-device-plugin-daemonset -n kube-system
Name: nvidia-device-plugin-daemonset
Selector: name=nvidia-device-plugin-ds
Node-Selector: <none>
Labels: <none>
Annotations: deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: name=nvidia-device-plugin-ds
Containers:
nvidia-device-plugin-ctr:
Image: nvcr.io/nvidia/k8s-device-plugin:v0.13.0
Port: <none>
Host Port: <none>
Environment:
FAIL_ON_INIT_ERROR: false
Mounts:
/var/lib/kubelet/device-plugins from device-plugin (rw)
Volumes:
device-plugin:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/device-plugins
HostPathType:
Priority Class Name: system-node-critical
Events: <none>
I've noticed status, but I don't know what I can do about it.
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
env:
Ubunu 20.04
GPU: NVIDIA GeForce RTX 3060
Since this is the first installation after formatting the desktop, there are no other unnecessary programs.
1条答案
按热度按时间oxiaedzo1#
I found a way!
Since Kubernetes 1.6, DaemonSets do not schedule on master nodes by default. In order to schedule it on master, I had to add a toleration into the
spec
section.toleration
into thespec
section.if you want confirm changes,
confirm Pod operation