-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
[BugFix] Fix KVConnectorOutput TPU breakage #22598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Incorrect assumptions about when it can be None. Signed-off-by: Nick Hill <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a bug where TPU functionality was broken when using KVConnector. The issue stemmed from TPUModelRunner
always creating a KVConnectorOutput
object, even when KV transfer was disabled. This caused an assertion failure in the scheduler.
The fix involves modifying TPUModelRunner
to only create a KVConnectorOutput
object when there are actual finished KV transfers, otherwise setting it to None
. A corresponding change is made in the test utilities. Additionally, a defensive check is added to the scheduler to handle cases where a model runner might still produce a KVConnectorOutput
when no connector is configured, preventing potential crashes.
The changes are logical, directly address the bug, and improve the robustness of the scheduler. The fix appears correct and complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: jingyu <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Paul Pak <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>
### What this PR does / why we need it? Mainline vLLM fixes its disaggregated prefill in vllm-project/vllm#22598 . But it is still not working in vllm-ascend. To be concrete, decoder instances crash before vllm's fix and hang after vllm's fix in ascend devices. This patch allows disaggregated prefill to work. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Qwen3-0.6B 1P1D tp=1 dp=1 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0fe8508 --------- Signed-off-by: CaveNightingale <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
TPU functionality was broken by #22157.
This is a quick fix, it would be good to also apply the changes in #21980 to
TPUModelRunner
.cc @sdavidbd