-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
We noticed that in the _get_logits
function of vllm, gather
instead of all_gather
will be used under certain conditions (the main condition is that for non-tpu devices):
Code link:
The change from using all_gather
to gather
is initially added in this PR for your reference: vllm-project/vllm#2221.
While in SGLang, we see currently all_gather
is always used:
logits = tensor_model_parallel_all_gather(logits) |
Does SGLang have the plan to add gather
instead of only all_gather
when gathering the logits? Per the practice in vllm, using gather
seems to have better performance than all_gather
on devices which have gather
support.
Related resources
No response
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed