Skip to content

Conversation

huachenheli
Copy link
Contributor

@huachenheli huachenheli commented Aug 19, 2025

Summary:

There are repeated implementations of registry-like classes that aim to provide extension support for various classes. Here we introduce a ExtensionManager as a formal registry for custom extensions so we can consolidate such implementations.

Note: KV connector factory has been reverted due to concerns around lazy imports.

Additional registry-like classes would be migrated one by one given that this PR is already huge.

Related to #22932

Test Plan:

pytest tests/plugins/test_extension_manager.py

Model loader + video loader:

python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-VL-7B-Instruct --port 8001 --host 0.0.0.0 --dtype bfloat16 --limit-mm-per-prompt '{"video":1}' --tool-call-parser hermes --enable-auto-tool-choice

Tool call:

python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-7B-Instruct --port 8001 --host 0.0.0.0 --dtype bfloat16 --tool-call-parser hermes --enable-auto-tool-choice

Reviewers:

Subscribers:

cc. @robertgshaw2-redhat @yeqcharlotte

Tasks:

Tags:

Purpose

Test Plan

Test Result

(Optional) Documentation Update


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added multi-modality Related to multi-modality (#4194) v1 tpu Related to Google TPUs labels Aug 19, 2025
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
@mergify mergify bot added deepseek Related to DeepSeek models frontend llama Related to Llama models tool-calling labels Aug 20, 2025
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
Signed-off-by: Chenheli Hua <[email protected]>
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
@huachenheli huachenheli force-pushed the extension_manager branch 2 times, most recently from fe7bd47 to d788bd1 Compare August 21, 2025 19:23
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
@huachenheli huachenheli changed the title [WIP][RFC] Formalize class-level extension management & consolidate registry-like implementations. [RFC] Formalize class-level extension management & consolidate registry-like implementations. Aug 22, 2025
@huachenheli huachenheli marked this pull request as ready for review August 22, 2025 03:38
@huachenheli huachenheli requested a review from hmellor as a code owner August 22, 2025 03:46
@mergify mergify bot added the documentation Improvements or additions to documentation label Aug 22, 2025
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that this may break existing use cases that use the old extension mechanisms. Small code changes to switch to ExtensionManager is needed for such use cases. However, the change should be very simple (i.e. add a xyz_manager.register(names=[...]) decorator).

why do we need to consolidate these while breaking exsiting code? I'm afraid consolidating them into one file would cause all sorts of problems that need to take care of lazy import and cross-process serialization problems.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
@huachenheli
Copy link
Contributor Author

huachenheli commented Aug 22, 2025

Please note that this may break existing use cases that use the old extension mechanisms. Small code changes to switch to ExtensionManager is needed for such use cases. However, the change should be very simple (i.e. add a xyz_manager.register(names=[...]) decorator).

why do we need to consolidate these while breaking exsiting code? I'm afraid consolidating them into one file would cause all sorts of problems that need to take care of lazy import and cross-process serialization problems.

The "manager" classes and methods are slightly differently named (so they are uniform within the code base), so users would need to tweak their code a bit to make it work. Mostly renames.

Regarding lazy import: At the moment it mainly impacts the kv connector. There's the create_or_import that allows on-the-fly import for code linked but not explicitly called by vLLM's code. In both unit test and disaggregated_prefill.sh it would correctly import the target class even without explicit link to their code. Of course, the linked code needs to call xyz_manager.register(...) but that's a minor change.

Regarding "cross-process serialization problems", can you elaborate more on the issue here? This should only impact class instantiation at startup time, so should not impact runtime behavior of existing code.

KV connector factory changes have been removed from this PR.

@huachenheli huachenheli requested a review from youkaichao August 22, 2025 17:02
@hmellor
Copy link
Member

hmellor commented Aug 25, 2025

Could we please maintain backward compatibility using https://typing.python.org/en/latest/spec/overload.html?

You can use https://typing-extensions.readthedocs.io/en/latest/index.html#typing_extensions.deprecated on the overloads so that we can let users know that they should switch to the new names.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
@huachenheli
Copy link
Contributor Author

Could we please maintain backward compatibility using https://typing.python.org/en/latest/spec/overload.html?

You can use https://typing-extensions.readthedocs.io/en/latest/index.html#typing_extensions.deprecated on the overloads so that we can let users know that they should switch to the new names.

I reverted the kv connector factory changes since the concern is mainly around lazy imports. The rest are straightforward register/create so can be cleanly migrated. ToolParserManager legacy API is also kept for compatibility.

@KuntaiDu
Copy link
Collaborator

Please note that this may break existing use cases that use the old extension mechanisms. Small code changes to switch to ExtensionManager is needed for such use cases. However, the change should be very simple (i.e. add a xyz_manager.register(names=[...]) decorator).

why do we need to consolidate these while breaking exsiting code? I'm afraid consolidating them into one file would cause all sorts of problems that need to take care of lazy import and cross-process serialization problems.

The "manager" classes and methods are slightly differently named (so they are uniform within the code base), so users would need to tweak their code a bit to make it work. Mostly renames.

Regarding lazy import: At the moment it mainly impacts the kv connector. There's the create_or_import that allows on-the-fly import for code linked but not explicitly called by vLLM's code. In both unit test and disaggregated_prefill.sh it would correctly import the target class even without explicit link to their code. Of course, the linked code needs to call xyz_manager.register(...) but that's a minor change.

Regarding "cross-process serialization problems", can you elaborate more on the issue here? This should only impact class instantiation at startup time, so should not impact runtime behavior of existing code.

KV connector factory changes have been removed from this PR.

I guess by "cross-process serialization issue", Kaichao means ray-related issues (currently vLLM relies on Ray to run cross-node pipeline parallel).

Copy link

mergify bot commented Aug 28, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @huachenheli.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend llama Related to Llama models multi-modality Related to multi-modality (#4194) needs-rebase tool-calling tpu Related to Google TPUs v1
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants