-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
[Core] Support multiple tasks per model #20771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Support multiple tasks per model #20771
Conversation
Signed-off-by: NickLucche <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @NickLucche, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request refactors the model configuration and task resolution logic to allow models that support transcription to also expose text generation (completions) endpoints. It achieves this by reclassifying 'transcription' as a task rather than a runner type, and introduces a dedicated 'transcription_only' runner for models like Whisper that are limited to transcription without generation capabilities. These changes provide greater flexibility in how models are configured and served.
Highlights
- Enhanced Model Capabilities: This pull request enables models implementing the
SupportsTranscription
interface (e.g., new multimodal models) to expose both transcription and text generation (completions) endpoints. - Task/Runner Type Redefinition: The
transcription
capability is now treated as atask
that can be handled by agenerate
runner, rather than being a standalonerunner_type
. This allows for more flexible model configurations. - New
transcription_only
Runner: A dedicatedtranscription_only
runner type has been introduced to specifically accommodate models, such as Whisper, that support transcription but are not designed for general text generation. - Dynamic Task Resolution Logic: The internal
_resolve_task
function invllm/config.py
has been significantly refactored to dynamically determine the appropriate runner type and supported tasks based on a model's capabilities, including special handling fortranscription_only
models. - API Server Alignment: The API server's initialization logic for transcription and translation services has been updated to check
model_config.task
instead ofmodel_config.runner_type
, aligning with the new classification oftranscription
as a task.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
Great work on this pull request! The changes to make transcription
a task and introduce a transcription_only
runner are well-designed and address the limitations of the previous implementation. This will certainly make it easier to integrate new multimodal models that support both transcription and generation.
I've left a few comments, including a high-severity bug fix for task resolution and some suggestions for code simplification to improve maintainability. Once these are addressed, this PR should be in excellent shape.
vllm/config.py
Outdated
if task == "transcription": | ||
return registry.is_transcription_model(architectures) | ||
elif task == "score": | ||
return runner_support["pooling"] | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for task == "score"
appears to be dead code. The task
parameter in is_task_supported
is of type _ResolvedTask
, which is defined as Literal["generate", "embed", "classify", "reward", "draft", "transcription"]
. It does not include "score"
.
The "score"
task option is handled later in _resolve_task
where it's converted to either "embed"
or "classify"
. Therefore, is_task_supported
will never be called with task="score"
.
You can safely remove this elif
block to improve code clarity.
elif task == "transcription":
return registry.is_transcription_model(architectures)
return True
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
SupportTranscriptions
model to expose completions/generate endpointsSigned-off-by: NickLucche <[email protected]>
if self.runner_type in ("draft", | ||
"generate") and self.task != "transcription": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to keep the logic exactly the same as before. I'm not sure whether transcription
task really needs truncation_side="right"
, feel free to change it
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
LGTM, merged |
Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Patrick von Platen <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: avigny <[email protected]>
Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: x22x22 <[email protected]>
Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
As we open-up the transcriptions api to MM models, we want to make sure the completions API is also available to the models implementing the
SupportsTranscription
interface.Currently this is not possible because
transcription
is arunner_type
and completions endpoints are only available togenerate
runners.One solution is to have
transcription
as a task, rather than a runner_type, similarly toembed
, so that we can expose both.Hence, for new models that support transcription I want the default runner-task pair to resolve to
At the same time, I want to maintain the current Whisper limit, which is that it cannot expose completions. Hence, Whisper models need to resolve to:
With the way we currently organize task->runner mapping, this is not representable because the mapping is not unique.
Therefore I resolve
runner_type
more dynamically inside_resolve_task
, allowing for greater flexibility.For handling whisper limitation, a
transcription-only
runner is added, with the purpose of not allowing the runner_type to begenerate
. This shouldn't interfere with other recent use of whisper in thescore
task.cc @DarkLight1337 , let me know if you see a cleaner way of designing this, thanks!
Implementation Details
See #20771 (comment)