Skip to content

[Feature] support gather instead of all_gather when gathering the logits #3365

@chunyuan-w

Description

@chunyuan-w

Checklist

Motivation

We noticed that in the _get_logits function of vllm, gather instead of all_gather will be used under certain conditions (the main condition is that for non-tpu devices):
Code link:

The change from using all_gather to gather is initially added in this PR for your reference: vllm-project/vllm#2221.

While in SGLang, we see currently all_gather is always used:

logits = tensor_model_parallel_all_gather(logits)

Does SGLang have the plan to add gather instead of only all_gather when gathering the logits? Per the practice in vllm, using gather seems to have better performance than all_gather on devices which have gather support.

Related resources

No response

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions