-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
[Core] Optimize update checks in LogitsProcessor #21245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request optimizes update checks in MinTokensLogitsProcessor
. I've added a suggestion to improve the maintainability of the new logic by making it more explicit and avoiding a side effect in a conditional statement.
Thanks @Jialin. I think I had similar logic in the my original impl of these LPs here https://github.com/vllm-project/vllm/pull/13360/files#diff-d01f143e1af472f24af24842cb879907ce624e6e5c977935e944545240723529R51 and hadn't realized that had been changed. cc @afeldman-nm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Signed-off-by: Jialin Ouyang <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]>
Head branch was pushed to by a user without write access
Thanks @Jialin ! I think this was probably my bad so thanks for the fix |
No worry :) |
Signed-off-by: Jialin Ouyang <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: qizixi <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: avigny <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: shuw <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: x22x22 <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Fix update checks in MinTokensLogitsProcessor and LogitBiasLogitsProcessor. For a benchmark run without override min length or logit bias, we still see noticeable cost coming from MinTokensLogitsProcessor and LogitBiasLogitsProcessor.
We found that it's due to inefficient needs_update tagging which would be tagged to True whenever there're new requests added to the batch. In this diff, we would tag needs_update to True, if
Test Plan
Rerun the benchmark.
Test Result
Confirmed the cost from MinTokensLogitsProcessor and LogitBiasLogitsProcessor is mostly gone.
After


Before
(Optional) Documentation Update