Skip to content

--torch-backend=auto fails on systems with multiple GPUs and without /proc/driver/nvidia/version #14647

@coezbek

Description

@coezbek

Summary

This only affects systems which don't have /sys/module/nvidia/version or /proc/driver/nvidia/version, i.e. for instance on WSL2.

When running uv pip install with the new --torch-backend=auto on such a system with multiple GPUs, the detection will fallback to using nvidia-smi. Unfortunately, nvidia-smi will return a separate line for each graphics card which isn't considered in the code:

$ nvidia-smi --query-gpu=driver_version --format=csv,noheader
572.60
572.60

This will make uv pip install with --torch-backend=auto fail with the following error:

uv pip install -U "vllm[audio]" --torch-backend=auto
error: after parsing `572.60
`, found `572.60
`, which is not part of a valid version

nvidia-smi does not respect NVIDIA_VISIBLE_DEVICES, so there is no way from the outside to use --torch-backend=auto with two graphics cards at the moment.

Workaround is to run nvidia-smi, identify CUDA version there and run with --torch-backend=cuXXX as indicated by nvidia-smi.

I think the wrongly implemented line of code is the one I marked in this pullrequestreview:

#12070 (review)

Or in the current code base:

let driver_version = Version::from_str(&String::from_utf8(output.stdout)?)?;

The easiest fix would be to take just the first line of the output of nvidia-smi. More elaborate would be a way to select the device to query or respecting NVIDIA_VISIBLE_DEVICE environment variable.

Platform

Linux 6.6.87.2-microsoft-standard-WSL2 x86_64 GNU/Linux

Version

uv 0.7.21

Python version

Python 3.12.3

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions