Skip to content

Conversation

@ispobock
Copy link
Collaborator

Motivation

Background: #9252
For target extend, we should set has_been_filtered=False, since it's actually not been filtered. has_been_filtered should be set to True only when the batch in the target verify mode.
Otherwise, in the last round of chunked prefill, the chunked prefill req will be excluded in the keep_indices, where len(keep_indices) != len(spec_info.topk_p), then it will throw the warning:

length of new_indices: {len(new_indices)} != length of topk_p: {len(self.topk_p)}, this should not happen

Issue Reproduction:

python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --speculative-algo EAGLE3 \
    --speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B  \
    --speculative-num-steps 2 --speculative-eagle-topk 1 --speculative-num-draft-tokens 3 \
    --dtype float16 --chunked-prefill-size 4

# change the prompt in send_one.py to:
#        prompt = [
#            "Human: Give me a fully functional FastAPI server. Show the python code.\n\nAssistant:",
#            "Human: Give me a fully functional FastAPI server.  the python code.\n\nAssistant:"
#        ]
python3 -m sglang.test.send_one --batch-size 2

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ispobock, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug within the speculative decoding mechanism, specifically concerning how batch filtering is handled during target extension. By correctly setting the has_been_filtered flag only when the batch is in target verification mode, it prevents erroneous exclusion of requests in chunked prefill, thereby enhancing the stability and correctness of the speculative decoding process.

Highlights

  • Speculative Decoding Filter Fix: Corrects an issue in the filter_batch method for speculative decoding, specifically addressing how the has_been_filtered flag is managed.
  • has_been_filtered Logic: Ensures the has_been_filtered flag is accurately set to True only when the batch is in target verification mode (self.forward_mode.is_target_verify()).
  • Prevents Incorrect Filtering: Resolves a bug where has_been_filtered was erroneously True during target extension, which caused incorrect filtering and warnings in chunked prefill operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug in the speculative decoding logic by conditionally filtering spec_info based on the forward mode. The change introduces a has_been_filtered flag to spec_info.filter_batch, which is correctly set to True only for TARGET_VERIFY mode. This prevents an incorrect filtering of speculative decoding data during other modes like chunked prefill, resolving a warning. The implementation is clear and correct. The switch to using keyword arguments also improves code readability.

@ispobock ispobock mentioned this pull request Sep 27, 2025
6 tasks
@ispobock ispobock merged commit 424591d into main Sep 30, 2025
95 of 115 checks passed
@ispobock ispobock deleted the ke/fix-eagle-filter-batch branch September 30, 2025 06:44
ch-tiger1 pushed a commit to ch-tiger1/sglang that referenced this pull request Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants