-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
When I load the 7b model using vllm, there is no OOM error reported. This error occurs when I run the grpo training code using "accelerate launch"
==============error code==============
[rank0]: File "/opt/conda/lib/python3.11/site-packages/trl/trainer/grpo_trainer.py", line 404, in init
[rank0]: self.llm = LLM(
[rank0]: ^^^^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/utils.py", line 1051, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 242, in init
[rank0]: self.llm_engine = self.engine_class.from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 484, in from_engine_args
[rank0]: engine = cls(
[rank0]: ^^^^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 276, in init
[rank0]: self._initialize_kv_caches()
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 429, in _initialize_kv_caches
[rank0]: self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 121, in initialize_cache
[rank0]: self.collective_rpc("initialize_cache",
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 51, in collective_rpc
[rank0]: answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/utils.py", line 2220, in run_method
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 306, in initialize_cache
[rank0]: self._init_cache_engine()
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 311, in _init_cache_engine
[rank0]: self.cache_engine = [
[rank0]: ^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 312, in
[rank0]: CacheEngine(self.cache_config, self.model_config,
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/worker/cache_engine.py", line 69, in init
[rank0]: self.gpu_cache = self._allocate_kv_cache(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/worker/cache_engine.py", line 103, in _allocate_kv_cache
[rank0]: layer_kv_cache = torch.zeros(alloc_shape,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 780.00 MiB. GPU 3 has a total capacity of 23.68 GiB of which 698.94 MiB is free. Process 1953163 has 22.99 GiB memory in use. Of the allocated memory 22.64 GiB is allocated by PyTorch, and 47.18 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
===========train env=================
model : Qwen/Qwen2.5-7B-Instruct
GPU : 3090 24G * 4
python : 3.11
trl : 0.15.2
torch : 2.5.1+cu124
transformer : 4.49.0