GRPO: OOM when init self.vllm

When I load the 7b model using vllm, there is no OOM error reported. This error occurs when I run the grpo training code using "accelerate launch"
==============error code==============
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/trl/trainer/grpo_trainer.py", line 404, in __init__
[rank0]:     self.llm = LLM(
[rank0]:                ^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/utils.py", line 1051, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 242, in __init__
[rank0]:     self.llm_engine = self.engine_class.from_engine_args(
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 484, in from_engine_args
[rank0]:     engine = cls(
[rank0]:              ^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 276, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 429, in _initialize_kv_caches
[rank0]:     self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 121, in initialize_cache
[rank0]:     self.collective_rpc("initialize_cache",
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 51, in collective_rpc
[rank0]:     answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/utils.py", line 2220, in run_method
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 306, in initialize_cache
[rank0]:     self._init_cache_engine()
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 311, in _init_cache_engine
[rank0]:     self.cache_engine = [
[rank0]:                         ^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 312, in <listcomp>
[rank0]:     CacheEngine(self.cache_config, self.model_config,
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/worker/cache_engine.py", line 69, in __init__
[rank0]:     self.gpu_cache = self._allocate_kv_cache(
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/worker/cache_engine.py", line 103, in _allocate_kv_cache
[rank0]:     layer_kv_cache = torch.zeros(alloc_shape,
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 780.00 MiB. GPU 3 has a total capacity of 23.68 GiB of which 698.94 MiB is free. Process 1953163 has 22.99 GiB memory in use. Of the allocated memory 22.64 GiB is allocated by PyTorch, and 47.18 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

===========train env=================
model : Qwen/Qwen2.5-7B-Instruct
GPU : 3090 24G * 4
python : 3.11
trl : 0.15.2
torch : 2.5.1+cu124
transformer : 4.49.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO: OOM when init self.vllm #3128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPO: OOM when init self.vllm #3128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions