Skip to content

[BUG]v3.11.8 版本 GPU 透传到 虚拟机后, 设备无法正常运行 #21826

@khw934

Description

@khw934

问题描述/What happened:

GPU 透传到 虚拟机后, 设备无法正常运行, 报错如下

/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0

image

环境/Environment:

  • OS (e.g. cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Host: (e.g. dmidecode | egrep -i 'manufacturer|product' |sort -u)
  • Service Version (e.g. kubectl exec -n onecloud $(kubectl get pods -n onecloud | grep climc | awk '{print $1}') -- climc version-list):

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions