CUDA Out of Memory

Hello,

I'm encountering a CUDA Out of Memory (OOM) issue while attempting to allocate an additional 768.00 MiB for model inference, despite having a seemingly sufficient amount of free memory on my NVIDIA GeForce RTX 3060 (6 GB total capacity). The exact warning message is as follows:

UserWarning: cuda device found but got the error CUDA out of memory. Tried to allocate 768.00 MiB (GPU 0; 5.80 GiB total capacity; 1.95 GiB already allocated; 356.75 MiB free; 2.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF - using CPU for inference


This occurs when I launch my application with Uvicorn, involving model registrations that potentially lead to this memory allocation issue. My environment is set up with CUDA 11.7 and PyTorch compatible with this CUDA version, running in a Conda environment on Ubuntu.

Could you provide insights or suggestions on how to manage or mitigate this OOM issue? Are there recommended practices for memory management or configurations specific to PyTorch that I should consider to optimize GPU memory usage and avoid hitting this limit?

Thank you for your support.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Out of Memory #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA Out of Memory #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions