-
Notifications
You must be signed in to change notification settings - Fork 20
[CB] Refactoring/Cleaning up prepare_prompt/decode #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
# current decode batch (Spyre constraint) | ||
left_padding = self.tkv - prompt_len | ||
|
||
# Reserve the maximal number of blocks used to serve current sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe? I was confused by "maximal" and "current"
# Reserve the maximal number of blocks used to serve current sequence | |
# Reserve the number of blocks required to serve this new sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Was so nice to read!
Signed-off-by: Yannick Schnider <[email protected]>
[CB] Refactoring/Cleaning up prepare_prompt/decode
As prefills for batch size > 1 have been de-prioritized, this PR cleans up the code considerable and makes it more readable. Code readability becomes increasingly important with the upcoming optimization for homogeneous tkv (e.g. #262 )