-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
[V0 Deprecation] Remove pooling model support in V0 #23434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V0 Deprecation] Remove pooling model support in V0 #23434
Conversation
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively removes support for pooling models in the vLLM V0 worker, which aligns with the stated objective. The changes are comprehensive, touching upon the engine, worker, sequence management, and tests to eliminate the V0 pooling code paths. Key modifications include the removal of PoolingModelRunner
and V0-specific pooling metadata, updating LLMEngine
and AsyncLLMEngine
to handle only sampling requests, and stubbing out pooling-related entry points to raise NotImplementedError
. The removal of token_type_ids
and related logic is also consistent with this goal. The code modifications appear to be correct and consistently applied across the repository. I have not found any issues of high or critical severity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm aware of.
I think it’s quite difficult to clean it up all at once. It might take some time to completely remove all the “compatibility code”. I hope this PR landing as soon as possible, so we can focus our optimization efforts on a single engine. |
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
I'm running into weird typing issues. Why is |
Remove all run_with_both_engines in vllm/tests/models/language/pooling and vllm/tests/entrypoints/ Let's land this PR quickly, as long as ci turns green, then polish it in subsequent PRs. |
clean up v0_only in tests/models/registry.py |
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
@DarkLight1337 , the basic tests are passing now. Can you enable the full test suite? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can merge if tests pass
…in-vllm-v0 Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
…in-vllm-v0 Signed-off-by: Max de Bayser <[email protected]>
3341ce9
to
73f0897
Compare
…in-vllm-v0 Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Is the failed test caused by a flaky test? |
Yes |
Follow-up issue: #23883 |
Continuation of PR #23302
Summary
@DarkLight1337 , @noooop , do you remember anything else to remove?