Skip to content

[Usage]: How to use Multi-instance in Vllm? (Model replication on multiple GPUs) #6155

@KimMinSang96

Description

@KimMinSang96

I would like to use techniques such as Multi-instance Support supported by the tensorrt-llm backend. In the documentation, I can see that multiple models are served using modes like Leader mode and Orchestrator mode. Does vLLM support this functionality separately? Or should I implement it similarly to the tensorrt-llm backend?

Here is for reference url : https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#leader-mode

Metadata

Metadata

Assignees

No one assigned

    Labels

    unstaleRecieved activity after being labelled staleusageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions