Support trl vllm-serve for multi-gpu vLLM inference

With https://github.com/huggingface/trl/pull/3094/ TRL will support vLLM for generation (at least in GRPO) by launching a server with `trl vllm-serve --model-name`. This means we can now use vLLM for larger models that require multi-gpu setups (by controlling setting different `CUDA_VISIBLE_DEVICES` for both the vLLM and training process). Looks like [peft support for it will be coming soon.](https://github.com/huggingface/trl/pull/3094/#issuecomment-2744947447). This means, in theory, you could fine-tune Llama 3 70B in 4-bit with GRPO and Unsloth (if you happen to have like, 3 A100s all linked together and a lot of time).

I may be jumping the gun a bit here, but I've been looking forward to multi-GPU vLLM support in TRL for a while now and would love to see it integrated with Unsloth (even if we're still limited to 1 GPU training for now).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support trl vllm-serve for multi-gpu vLLM inference #2184

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support trl vllm-serve for multi-gpu vLLM inference #2184

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions