Skip to content

Commit f678778

Browse files
authored
[image] add GKE GPU operator compat paths to ray-llm image PATH, LD_LIBRARY_PATH (#55206)
The ray-llm base image, nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04 no longer includes the GPU operator compatibility paths included in the runtime image. Additional context on those compatibility paths: https://gitlab.com/nvidia/container-images/cuda/-/issues/47 --------- Signed-off-by: Seiji Eicher <[email protected]>
1 parent 6f3ad1f commit f678778

File tree

1 file changed

+12
-2
lines changed

1 file changed

+12
-2
lines changed

docker/ray-llm/Dockerfile

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,16 @@ rm -rf "${EP_TEMP_DIR}"
188188

189189
EOF
190190

191-
ENV PATH="${UCX_HOME}/bin:${NIXL_HOME}/bin:${PATH}"
192-
ENV LD_LIBRARY_PATH="${UCX_HOME}/lib:${NIXL_HOME}/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}"
191+
# Q: Why add paths that don't exist in the base image, like /usr/local/nvidia/lib64
192+
# and /usr/local/nvidia/bin?
193+
# A: The NVIDIA GPU operator version used by GKE injects these into the container
194+
# after it's mounted to a pod.
195+
# Issue is tracked here:
196+
# https://github.com/GoogleCloudPlatform/compute-gpu-installation/issues/46
197+
# More context here:
198+
# https://github.com/NVIDIA/nvidia-container-toolkit/issues/275
199+
# and here:
200+
# https://gitlab.com/nvidia/container-images/cuda/-/issues/27
201+
ENV PATH="${PATH}:${UCX_HOME}/bin:${NIXL_HOME}/bin:/usr/local/nvidia/bin"
202+
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${UCX_HOME}/lib:${NIXL_HOME}/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64"
193203
ENV NIXL_PLUGIN_DIR="${NIXL_HOME}/lib/x86_64-linux-gnu/plugins/"

0 commit comments

Comments
 (0)