|
1 |
| -# Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes |
2 |
| - |
3 |
| -This DRA resource driver is currently under active development and not yet |
4 |
| -designed for production use. |
5 |
| -We may (at times) decide to push commits over `main` until we have something more stable. |
6 |
| -Use at your own risk. |
7 |
| - |
8 |
| -A document and demo of the DRA support for GPUs provided by this repo can be found below: |
9 |
| -| Document | Demo | |
10 |
| -|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| |
11 |
| -| [<img width="300" alt="Dynamic Resource Allocation (DRA) for GPUs in Kubernetes" src="https://drive.google.com/uc?export=download&id=12EwdvHHI92FucRO2tuIqLR33OC8MwCQK">](https://docs.google.com/document/d/1BNWqgx_SmZDi-va_V31v3DnuVwYnF2EmN7D-O_fB6Oo) | [<img width="300" alt="Demo of Dynamic Resource Allocation (DRA) for GPUs in Kubernetes" src="https://drive.google.com/uc?export=download&id=1UzB-EBEVwUTRF7R0YXbGe9hvTjuKaBlm">](https://drive.google.com/file/d/1iLg2FEAEilb1dcI27TnB19VYtbcvgKhS/view?usp=sharing "Demo of Dynamic Resource Allocation (DRA) for GPUs in Kubernetes") | |
12 |
| - |
13 |
| -## Demo |
14 |
| - |
15 |
| -This section describes using `kind` to demo the functionality of the NVIDIA GPU DRA Driver. |
16 |
| - |
17 |
| -First since we'll launch kind with GPU support, ensure that the following prerequisites are met: |
18 |
| -1. `kind` is installed. See the official documentation [here](https://kind.sigs.k8s.io/docs/user/quick-start/#installation). |
19 |
| -1. Ensure that the NVIDIA Container Toolkit is installed on your system. This |
20 |
| - can be done by following the instructions |
21 |
| - [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). |
22 |
| -1. Configure the NVIDIA Container Runtime as the **default** Docker runtime: |
23 |
| - ```console |
24 |
| - sudo nvidia-ctk runtime configure --runtime=docker --set-as-default |
25 |
| - ``` |
26 |
| -1. Restart Docker to apply the changes: |
27 |
| - ```console |
28 |
| - sudo systemctl restart docker |
29 |
| - ``` |
30 |
| -1. Set the `accept-nvidia-visible-devices-as-volume-mounts` option to `true` in |
31 |
| - the `/etc/nvidia-container-runtime/config.toml` file to configure the NVIDIA |
32 |
| - Container Runtime to use volume mounts to select devices to inject into a |
33 |
| - container. |
34 |
| - ``` console |
35 |
| - sudo nvidia-ctk config --in-place --set accept-nvidia-visible-devices-as-volume-mounts=true |
36 |
| - ``` |
37 |
| - |
38 |
| -1. Show the current set of GPUs on the machine: |
39 |
| - ```console |
40 |
| - nvidia-smi -L |
41 |
| - ``` |
42 |
| - |
43 |
| -We start by first cloning this repository and `cd`ing into it. |
44 |
| -All of the scripts and example Pod specs used in this demo are in the `demo` |
45 |
| -subdirectory, so take a moment to browse through the various files and see |
46 |
| -what's available: |
| 1 | +# NVIDIA DRA Driver for GPUs |
47 | 2 |
|
48 |
| -```console |
49 |
| -git clone https://github.com/NVIDIA/k8s-dra-driver-gpu.git |
50 |
| -``` |
51 |
| -```console |
52 |
| -cd k8s-dra-driver-gpu |
53 |
| -``` |
| 3 | +Enables |
54 | 4 |
|
55 |
| -### Setting up the infrastructure |
| 5 | +* flexible and powerful allocation and dynamic reconfiguration of GPUs as well as |
| 6 | +* allocation of ComputeDomains for robust and secure Multi-Node NVLink. |
56 | 7 |
|
57 |
| -First, create a `kind` cluster to run the demo: |
58 |
| -```bash |
59 |
| -./demo/clusters/kind/create-cluster.sh |
60 |
| -``` |
| 8 | +For Kubernetes 1.32 or newer, with Dynamic Resource Allocation (DRA) [enabled](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#enabling-dynamic-resource-allocation). |
61 | 9 |
|
62 |
| -From here we will build the image for the example resource driver: |
63 |
| -```console |
64 |
| -./demo/clusters/kind/build-dra-driver-gpu.sh |
65 |
| -``` |
| 10 | +## Overview |
66 | 11 |
|
67 |
| -This also makes the built images available to the `kind` cluster. |
| 12 | +DRA is a novel concept in Kubernetes for flexibly requesting, configuring, and sharing specialized devices like GPUs. |
| 13 | +To learn more about DRA in general, good starting points are: [Kubernetes docs](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/), [GKE docs](https://cloud.google.com/kubernetes-engine/docs/concepts/about-dynamic-resource-allocation), [Kubernetes blog](https://kubernetes.io/blog/2025/05/01/kubernetes-v1-33-dra-updates/). |
68 | 14 |
|
69 |
| -We now install the NVIDIA GPU DRA driver: |
70 |
| -```console |
71 |
| -./demo/clusters/kind/install-dra-driver-gpu.sh |
72 |
| -``` |
| 15 | +Most importantly, DRA puts resource configuration and scheduling in the hands of 3rd-party vendors. |
73 | 16 |
|
74 |
| -This should show two pods running in the `nvidia-dra-driver-gpu` namespace: |
75 |
| -```console |
76 |
| -kubectl get pods -n nvidia-dra-driver-gpu |
77 |
| -``` |
78 |
| -``` |
79 |
| -$ kubectl get pods -n nvidia-dra-driver-gpu |
80 |
| -NAME READY STATUS RESTARTS AGE |
81 |
| -nvidia-dra-driver-gpu-controller-697898fc6b-g85zx 1/1 Running 0 40s |
82 |
| -nvidia-dra-driver-gpu-kubelet-plugin-kkwf7 2/2 Running 0 40s |
83 |
| -``` |
| 17 | +The NVIDIA DRA Driver for GPUs manages two types of resources: **GPUs** and **ComputeDomains**. Correspondingly, it contains two DRA kubelet plugins: [gpu-kubelet-plugin](https://github.com/NVIDIA/k8s-dra-driver-gpu/tree/main/cmd/gpu-kubelet-plugin), [compute-domain-kubelet-plugin](https://github.com/NVIDIA/k8s-dra-driver-gpu/tree/main/cmd/compute-domain-kubelet-plugin). Upon driver installation, each of these two parts can be enabled or disabled separately. |
84 | 18 |
|
85 |
| -### Run the examples by following the steps in the demo script |
86 |
| -Finally, you can run the various examples contained in the `demo/specs/quickstart` folder. |
87 |
| -With the most recent updates for Kubernetes v1.31, only the first 3 examples in |
88 |
| -this folder are currently functional. |
| 19 | +The two sections below provide a brief overview for each of these two parts of this DRA driver. |
89 | 20 |
|
90 |
| -You can run them as follows: |
91 |
| -```console |
92 |
| -kubectl apply --filename=demo/specs/quickstart/gpu-test{1,2,3}.yaml |
93 |
| -``` |
| 21 | +### `ComputeDomain`s |
94 | 22 |
|
95 |
| -Get the pods' statuses. Depending on which GPUs are available, running the first three examples will produce output similar to the following... |
| 23 | +An abstraction for robust and secure Multi-Node NVLink (MNNVL). Officially supported. |
| 24 | + |
| 25 | +An individual `ComputeDomain` (CD) guarantees MNNVL-reachability between pods that are _in_ the CD, and secure isolation from other pods that are _not in_ the CD. |
| 26 | + |
| 27 | +In terms of placement, a CD follows the workload. In terms of lifetime, a CD is ephemeral: its lifetime is bound to the lifetime of the consuming workload. |
| 28 | +For more background on how `ComputeDomain`s facilitate orchestrating MNNVL workloads on Kubernetes (and on NVIDIA GB200 systems in particular), see [this](https://docs.google.com/document/d/1PrdDofsPFVJuZvcv-vtlI9n2eAh-YVf_fRQLIVmDwVY/edit?tab=t.0#heading=h.qkogm924v5so) doc and [this](https://docs.google.com/presentation/d/1Xupr8IZVAjs5bNFKJnYaK0LE7QWETnJjkz6KOfLu87E/edit?pli=1&slide=id.g28ac369118f_0_1647#slide=id.g28ac369118f_0_1647) slide deck. |
| 29 | +For an outlook and specific plans for improvements, please refer to [these](https://github.com/NVIDIA/k8s-dra-driver-gpu/releases/tag/v25.3.0-rc.3) release notes. |
| 30 | + |
| 31 | +If you've heard about IMEX: this DRA driver orchestrates IMEX primitives (daemons, domains, channels) under the hood. |
| 32 | + |
| 33 | +### `GPU`s |
| 34 | + |
| 35 | +The GPU allocation side of this DRA driver [will enable powerful features](https://docs.google.com/document/d/1BNWqgx_SmZDi-va_V31v3DnuVwYnF2EmN7D-O_fB6Oo) (such as dynamic allocation of MIG devices). |
| 36 | +To learn about what we're planning to build, please have a look at [these](https://github.com/NVIDIA/k8s-dra-driver-gpu/releases/tag/v25.3.0-rc.3) release notes. |
| 37 | + |
| 38 | +While some GPU allocation features can be tried out, they are not yet officially supported. |
| 39 | +Hence, the GPU kubelet plugin is currently disabled by default in the Helm chart installation. |
| 40 | + |
| 41 | +For exploration and demonstration purposes, see the "demo" section below, and also browse the `demo/specs/quickstart` directory in this repository. |
| 42 | + |
| 43 | +## Installation |
| 44 | + |
| 45 | +As of today, the recommended installation method is via Helm. |
| 46 | +Detailed instructions can (for now) be found [here](https://github.com/NVIDIA/k8s-dra-driver-gpu/discussions/249). |
| 47 | +In the future, this driver will be included in the [NVIDIA GPU Operator](https://github.com/NVIDIA/gpu-operator) and does not need to be installed separately anymore. |
| 48 | + |
| 49 | +## A (kind) demo |
| 50 | + |
| 51 | +Below, we demonstrate a basic use case: sharing a single GPU across two containers running in the same Kubernetes pod. |
| 52 | + |
| 53 | +**Step 1: install dependencies** |
| 54 | + |
| 55 | +Running this demo requires |
| 56 | +* kind (follow the official [installation docs](https://kind.sigs.k8s.io/docs/user/quick-start/#installation)) |
| 57 | +* NVIDIA Container Toolkit & Runtime (follow a [previous version](https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/5a4717f1ea613ad47bafccb467582bf2425f20f1/README.md#demo) of this readme for setup instructions) |
| 58 | + |
| 59 | +**Step 2: create kind cluster with the DRA driver installed** |
| 60 | + |
| 61 | +Start by cloning this repository, and `cd`in into it: |
96 | 62 |
|
97 |
| -**Note:** there is a [known issue with kind](https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files). You may see an error while trying to tail the log of a running pod in the kind cluster: `failed to create fsnotify watcher: too many open files.` The issue may be resolved by increasing the value for `fs.inotify.max_user_watches`. |
98 |
| -```console |
99 |
| -kubectl get pod -A -l app=pod |
100 |
| -``` |
101 |
| -``` |
102 |
| -NAMESPACE NAME READY STATUS RESTARTS AGE |
103 |
| -gpu-test1 pod1 1/1 Running 0 34s |
104 |
| -gpu-test1 pod2 1/1 Running 0 34s |
105 |
| -gpu-test2 pod 2/2 Running 0 34s |
106 |
| -gpu-test3 pod1 1/1 Running 0 34s |
107 |
| -gpu-test3 pod2 1/1 Running 0 34s |
108 |
| -``` |
109 |
| -```console |
110 |
| -kubectl logs -n gpu-test1 -l app=pod |
111 |
| -``` |
112 |
| -``` |
113 |
| -GPU 0: A100-SXM4-40GB (UUID: GPU-662077db-fa3f-0d8f-9502-21ab0ef058a2) |
114 |
| -GPU 0: A100-SXM4-40GB (UUID: GPU-4cf8db2d-06c0-7d70-1a51-e59b25b2c16c) |
115 |
| -``` |
116 | 63 | ```console
|
117 |
| -kubectl logs -n gpu-test2 pod --all-containers |
118 |
| -``` |
119 |
| -``` |
120 |
| -GPU 0: A100-SXM4-40GB (UUID: GPU-79a2ba02-a537-ccbf-2965-8e9d90c0bd54) |
121 |
| -GPU 0: A100-SXM4-40GB (UUID: GPU-79a2ba02-a537-ccbf-2965-8e9d90c0bd54) |
| 64 | +git clone https://github.com/NVIDIA/k8s-dra-driver-gpu.git |
| 65 | +cd k8s-dra-driver-gpu |
122 | 66 | ```
|
123 | 67 |
|
| 68 | +Next up, build this driver's container image and create a kind-based Kubernetes cluster: |
| 69 | + |
124 | 70 | ```console
|
125 |
| -kubectl logs -n gpu-test3 -l app=pod |
126 |
| -``` |
127 |
| -``` |
128 |
| -GPU 0: A100-SXM4-40GB (UUID: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c) |
129 |
| -GPU 0: A100-SXM4-40GB (UUID: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c) |
| 71 | +export KIND_CLUSTER_NAME="kind-dra-1" |
| 72 | +./demo/clusters/kind/build-dra-driver-gpu.sh |
| 73 | +./demo/clusters/kind/create-cluster.sh |
130 | 74 | ```
|
131 | 75 |
|
132 |
| -### Cleaning up the environment |
| 76 | +Now you can install the DRA driver's Helm chart into the Kubernetes cluster: |
133 | 77 |
|
134 |
| -Remove the cluster created in the preceding steps: |
135 | 78 | ```console
|
136 |
| -./demo/clusters/kind/delete-cluster.sh |
| 79 | +./demo/clusters/kind/install-dra-driver-gpu.sh |
137 | 80 | ```
|
138 | 81 |
|
139 |
| -<!-- |
140 |
| -TODO: This README should be extended with additional content including: |
| 82 | +**Step 3: run workload** |
141 | 83 |
|
142 |
| -## Information for "real" deployment including prerequesites |
| 84 | +Submit workload: |
143 | 85 |
|
144 |
| -This may include the following content from the original scripts: |
| 86 | +```console |
| 87 | +kubectl apply -f ./demo/specs/quickstart/gpu-test2.yaml |
145 | 88 | ```
|
146 |
| -set -e |
147 | 89 |
|
148 |
| -export VERSION=v25.2.0 |
| 90 | +If you're curious, have a look at [the `ResourceClaimTemplate`](https://github.com/jgehrcke/k8s-dra-driver-gpu/blob/526130fbaa3c8f5b1f6dcfd9ef01c9bdd5c229fe/demo/specs/quickstart/gpu-test2.yaml#L12) definition in this spec, and how the corresponding _single_ `ResourceClaim` is [being referenced](https://github.com/jgehrcke/k8s-dra-driver-gpu/blob/526130fbaa3c8f5b1f6dcfd9ef01c9bdd5c229fe/demo/specs/quickstart/gpu-test2.yaml#L46) by both containers. |
149 | 91 |
|
150 |
| -REGISTRY=nvcr.io/nvidia |
151 |
| -IMAGE=k8s-dra-driver-gpu |
152 |
| -PLATFORM=ubi9 |
| 92 | +Container log inspection then indeed reveals that both containers operate on the same GPU device: |
153 | 93 |
|
154 |
| -sudo true |
155 |
| -make -f deployments/container/Makefile build-${PLATFORM} |
156 |
| -docker tag ${REGISTRY}/${IMAGE}:${VERSION}-${PLATFORM} ${REGISTRY}/${IMAGE}:${VERSION} |
157 |
| -docker save ${REGISTRY}/${IMAGE}:${VERSION} > image.tgz |
158 |
| -sudo ctr -n k8s.io image import image.tgz |
| 94 | +```bash |
| 95 | +$ kubectl logs pod -n gpu-test2 --all-containers --prefix |
| 96 | +[pod/pod/ctr0] GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c) |
| 97 | +[pod/pod/ctr1] GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c) |
159 | 98 | ```
|
160 | 99 |
|
161 |
| -## Information on advanced usage such as MIG. |
| 100 | +## Contributing |
162 | 101 |
|
163 |
| -This includes setting configuring MIG on the host using mig-parted. Some of the demo scripts included |
164 |
| -in ./demo/ require this. |
| 102 | +Contributions require a Developer Certificate of Origin (DCO, see [CONTRIBUTING.md](https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/main/CONTRIBUTING.md)). |
165 | 103 |
|
166 |
| -``` |
167 |
| -cat <<EOF | sudo -E nvidia-mig-parted apply -f - |
168 |
| -version: v1 |
169 |
| -mig-configs: |
170 |
| -half-half: |
171 |
| - - devices: [0,1,2,3] |
172 |
| - mig-enabled: false |
173 |
| - - devices: [4,5,6,7] |
174 |
| - mig-enabled: true |
175 |
| - mig-devices: {} |
176 |
| -EOF |
177 |
| -``` |
178 |
| ---> |
| 104 | +## Support |
| 105 | + |
| 106 | +Please open an issue on the GitHub project for questions and for reporting problems. |
| 107 | +Your feedback is appreciated! |
0 commit comments