-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM #7962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull from head
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
/ready |
docs/source/models/spec_decode.rst
Outdated
3. **vLLM Logprob Stability** | ||
- vLLM currently does not guarantee stable log probabilities (logprobs) across different batch sizes, which might | ||
cause small variations in output probabilities. | ||
This issue may stem from non-deterministic behaviors in batched operations or numerical instability in Torch operations. | ||
as explained in the `Numerical Accuracy section <https://pytorch.org/docs/stable/notes/numerical_accuracy.html#batched-computations-or-slice-computations>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sroy745 this isn't spec decoding specific, it applies generally when concurrent requests are batched differently. I guess would be good to have a dedicated section explaining that too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a section for this in serving/faq.rst (I could not find any other generic place to add it. As you mentioned it is not specific to spec decode so thought of adding it to serving faqs). I added a link to it in this subsection. I am not sure if this is what you meant. PTAL and let me know.
docs/source/models/spec_decode.rst
Outdated
|
||
2. **Algorithmic Losslessness** | ||
- vLLM’s implementation of speculative decoding is algorithmically validated to be lossless when the | ||
temperature parameter (`temp`) is set to 0. Key tests include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rejection sampler convergence tests also handle the case where temperature is nonzero, and/or other sampling parameters are applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed mention of temperature = 0 in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. Addressed your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sroy745, looks great, sorry for the late review!
…ees in Speculative Decoding in vLLM (vllm-project#7962)
…ees in Speculative Decoding in vLLM (vllm-project#7962) Signed-off-by: Alvant <[email protected]>
…ees in Speculative Decoding in vLLM (vllm-project#7962) Signed-off-by: LeiWang1999 <[email protected]>
Add documentation about lossless guarantees in Speculative Decoding in vLLM. The documentation documents the finding in this issue here #7627
@cadedaniel