[Core] Support multiple tasks per model #20771

NickLucche · 2025-07-10T16:54:16Z

As we open-up the transcriptions api to MM models, we want to make sure the completions API is also available to the models implementing the SupportsTranscription interface.

Currently this is not possible because transcription is a runner_type and completions endpoints are only available to generate runners.
One solution is to have transcription as a task, rather than a runner_type, similarly to embed, so that we can expose both.

Hence, for new models that support transcription I want the default runner-task pair to resolve to

some_MM_model ==> runner_type: generate | task: transcription

At the same time, I want to maintain the current Whisper limit, which is that it cannot expose completions. Hence, Whisper models need to resolve to:

some_whisper_model ==> runner_type: !generate | task: transcription

With the way we currently organize task->runner mapping, this is not representable because the mapping is not unique.
Therefore I resolve runner_type more dynamically inside _resolve_task, allowing for greater flexibility.

For handling whisper limitation, a transcription-only runner is added, with the purpose of not allowing the runner_type to be generate. This shouldn't interfere with other recent use of whisper in the score task.

cc @DarkLight1337 , let me know if you see a cleaner way of designing this, thanks!

Implementation Details

See #20771 (comment)

Signed-off-by: NickLucche <[email protected]>

github-actions · 2025-07-10T16:54:26Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @NickLucche, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the model configuration and task resolution logic to allow models that support transcription to also expose text generation (completions) endpoints. It achieves this by reclassifying 'transcription' as a task rather than a runner type, and introduces a dedicated 'transcription_only' runner for models like Whisper that are limited to transcription without generation capabilities. These changes provide greater flexibility in how models are configured and served.

Highlights

Enhanced Model Capabilities: This pull request enables models implementing the SupportsTranscription interface (e.g., new multimodal models) to expose both transcription and text generation (completions) endpoints.
Task/Runner Type Redefinition: The transcription capability is now treated as a task that can be handled by a generate runner, rather than being a standalone runner_type. This allows for more flexible model configurations.
New transcription_only Runner: A dedicated transcription_only runner type has been introduced to specifically accommodate models, such as Whisper, that support transcription but are not designed for general text generation.
Dynamic Task Resolution Logic: The internal _resolve_task function in vllm/config.py has been significantly refactored to dynamically determine the appropriate runner type and supported tasks based on a model's capabilities, including special handling for transcription_only models.
API Server Alignment: The API server's initialization logic for transcription and translation services has been updated to check model_config.task instead of model_config.runner_type, aligning with the new classification of transcription as a task.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

Great work on this pull request! The changes to make transcription a task and introduce a transcription_only runner are well-designed and address the limitations of the previous implementation. This will certainly make it easier to integrate new multimodal models that support both transcription and generation.

I've left a few comments, including a high-severity bug fix for task resolution and some suggestions for code simplification to improve maintainability. Once these are addressed, this PR should be in excellent shape.

vllm/config.py

gemini-code-assist · 2025-07-10T16:56:39Z

vllm/config.py

+            if task == "transcription":
+                return registry.is_transcription_model(architectures)
+            elif task == "score":
+                return runner_support["pooling"]
+            return True


The check for task == "score" appears to be dead code. The task parameter in is_task_supported is of type _ResolvedTask, which is defined as Literal["generate", "embed", "classify", "reward", "draft", "transcription"]. It does not include "score".

The "score" task option is handled later in _resolve_task where it's converted to either "embed" or "classify". Therefore, is_task_supported will never be called with task="score".

You can safely remove this elif block to improve code clarity.

elif task == "transcription": return registry.is_transcription_model(architectures) return True

vllm/model_executor/models/interfaces.py

vllm/config.py

Signed-off-by: NickLucche <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: NickLucche <[email protected]>

DarkLight1337 · 2025-07-12T06:52:30Z

vllm/config.py

+        if self.runner_type in ("draft",
+                                "generate") and self.task != "transcription":


This is just to keep the logic exactly the same as before. I'm not sure whether transcription task really needs truncation_side="right", feel free to change it

Signed-off-by: NickLucche <[email protected]>

DarkLight1337 · 2025-07-13T02:40:18Z

LGTM, merged

Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Patrick von Platen <[email protected]>

Signed-off-by: Seiji Eicher <[email protected]>

Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: avigny <[email protected]>

Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: x22x22 <[email protected]>

Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

transcriptions as task and generate runner

df4acae

Signed-off-by: NickLucche <[email protected]>

NickLucche requested review from simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor and aarnphm as code owners July 10, 2025 16:54

gemini-code-assist bot reviewed Jul 10, 2025

View reviewed changes

mergify bot added frontend new-model Requests to new models labels Jul 10, 2025

gemini-code-assist bot reviewed Jul 10, 2025

View reviewed changes

DarkLight1337 added this to Multi-modality Core Jul 10, 2025

DarkLight1337 moved this to In Progress in Multi-modality Core Jul 10, 2025

DarkLight1337 reviewed Jul 10, 2025

View reviewed changes

vllm/model_executor/models/interfaces.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 10, 2025

View reviewed changes

vllm/model_executor/models/interfaces.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 10, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 10, 2025

View reviewed changes

vllm/config.py Show resolved Hide resolved

NickLucche and others added 9 commits July 10, 2025 17:15

fix transcription task bug

3993edf

Signed-off-by: NickLucche <[email protected]>

Merge branch 'main' into transcriptions-support-generate

e108225

Update

0f6a446

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into transcriptions-support-generate

633b7f0

Update test

23f9ef5

Signed-off-by: DarkLight1337 <[email protected]>

Address comment

cdd10bf

Signed-off-by: DarkLight1337 <[email protected]>

Make mypy happy

f083477

Signed-off-by: DarkLight1337 <[email protected]>

Fix test

0c2141b

Signed-off-by: DarkLight1337 <[email protected]>

Fix

6bbfd94

Signed-off-by: DarkLight1337 <[email protected]>

Fix

5d919c2

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 11, 2025

Fix pre-commit

e4bbc2a

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 changed the title ~~[Frontend] Allow SupportTranscriptions model to expose completions/generate endpoints~~ [Core] Support multiple tasks per model Jul 11, 2025

fix draft

30a2baa

Signed-off-by: NickLucche <[email protected]>

DarkLight1337 reviewed Jul 12, 2025

View reviewed changes

NickLucche added 3 commits July 12, 2025 07:18

type

e6a8f60

Signed-off-by: NickLucche <[email protected]>

fix pooling

b25120a

Signed-off-by: NickLucche <[email protected]>

fix examples

f73b670

Signed-off-by: NickLucche <[email protected]>

vllm-bot merged commit 020f58a into vllm-project:main Jul 13, 2025
68 of 70 checks passed

github-project-automation bot moved this from In Progress to Done in Multi-modality Core Jul 13, 2025

eicherseiji added a commit to eicherseiji/vllm that referenced this pull request Jul 15, 2025

Since vllm-project#20771, task may default to generate. Specify directly

912eec8

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji mentioned this pull request Jul 15, 2025

Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests #20831

Merged

4 tasks

DarkLight1337 mentioned this pull request Jul 19, 2025

[Model][1/N] Support multiple poolers at model level #21227

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Support multiple tasks per model #20771

[Core] Support multiple tasks per model #20771

Uh oh!

NickLucche commented Jul 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Jul 12, 2025

Uh oh!

Uh oh!

DarkLight1337 commented Jul 13, 2025

Uh oh!

Uh oh!

		if self.runner_type in ("draft",
		"generate") and self.task != "transcription":

Uh oh!

[Core] Support multiple tasks per model #20771

[Core] Support multiple tasks per model #20771

Uh oh!

Conversation

NickLucche commented Jul 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Details

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented Jul 13, 2025

Uh oh!

Uh oh!

NickLucche commented Jul 10, 2025 •

edited by github-actions bot

Loading