Skip to content

CUDA Out of Memory Error During Inference in samapi Environment #16

@halqadasi

Description

@halqadasi

While running inference tasks in the samapi environment, I encountered a CUDA out of memory error, causing the application to fallback to CPU inference. This issue significantly impacts performance. I'm looking for advice on mitigating this error or any potential fixes.

Environment

  • Operating System: Ubuntu 20.04
  • Python Version: 3.10
  • Anaconda Environment: samapi
  • GPU Model: NVIDIA RTX 4080

Steps to Reproduce

  1. Restart the server to ensure no residual GPU memory usage.
  2. Activate the samapi environment: source activate samapi
  3. Run the command: uvicorn samapi.main:app --workers 2
  4. Error encountered after selecting the vim-h and starting the labeling process.

Expected Behavior

I expected the GPU to handle the inference tasks without running out of memory, allowing for faster processing times.

Actual Behavior

Received a warning/error indicating CUDA out of memory. The system defaulted to using the CPU for inference, significantly slowing down the process. The error message was:

/home/.../anaconda3/envs/samapi/lib/python3.10/site-packages/samapi/main.py:152: UserWarning: cuda device found but got the error CUDA out of memory. Tried to allocate 768.00 MiB (GPU 3; 10.75 GiB total capacity; 1.95 GiB already allocated; 244.25 MiB free; 2.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF - using CPU for inference

Additional Information

  • The issue occurs under both light and heavy workloads.
  • No significant processes were running on the GPU aside from the current task.
  • Attempted solutions: I experienced the same issue previously with the label-studio ML Backend I solved it. The error is because of loading the SAM vit-h model each time I label. I solved this issue by loading the model only once at the beginning of the labeling. The error has been solved. Please look at this https://github.com/open-mmlab/playground/issues/150

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions