Skip to content

Support trl vllm-serve for multi-gpu vLLM inference #2184

@JC-LMCO

Description

@JC-LMCO

With huggingface/trl#3094 TRL will support vLLM for generation (at least in GRPO) by launching a server with trl vllm-serve --model-name. This means we can now use vLLM for larger models that require multi-gpu setups (by controlling setting different CUDA_VISIBLE_DEVICES for both the vLLM and training process). Looks like peft support for it will be coming soon.. This means, in theory, you could fine-tune Llama 3 70B in 4-bit with GRPO and Unsloth (if you happen to have like, 3 A100s all linked together and a lot of time).

I may be jumping the gun a bit here, but I've been looking forward to multi-GPU vLLM support in TRL for a while now and would love to see it integrated with Unsloth (even if we're still limited to 1 GPU training for now).

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestFeature request pending on roadmap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions