Fix spec filter batch when target extend #10991

ispobock · 2025-09-27T15:53:23Z

Motivation

Background: #9252
For target extend, we should set has_been_filtered=False, since it's actually not been filtered. has_been_filtered should be set to True only when the batch in the target verify mode.
Otherwise, in the last round of chunked prefill, the chunked prefill req will be excluded in the keep_indices, where len(keep_indices) != len(spec_info.topk_p), then it will throw the warning:

length of new_indices: {len(new_indices)} != length of topk_p: {len(self.topk_p)}, this should not happen

Issue Reproduction:

python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --speculative-algo EAGLE3 \
    --speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B  \
    --speculative-num-steps 2 --speculative-eagle-topk 1 --speculative-num-draft-tokens 3 \
    --dtype float16 --chunked-prefill-size 4

# change the prompt in send_one.py to:
#        prompt = [
#            "Human: Give me a fully functional FastAPI server. Show the python code.\n\nAssistant:",
#            "Human: Give me a fully functional FastAPI server.  the python code.\n\nAssistant:"
#        ]
python3 -m sglang.test.send_one --batch-size 2

gemini-code-assist · 2025-09-27T15:53:35Z

Summary of Changes

Hello @ispobock, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug within the speculative decoding mechanism, specifically concerning how batch filtering is handled during target extension. By correctly setting the has_been_filtered flag only when the batch is in target verification mode, it prevents erroneous exclusion of requests in chunked prefill, thereby enhancing the stability and correctness of the speculative decoding process.

Highlights

Speculative Decoding Filter Fix: Corrects an issue in the filter_batch method for speculative decoding, specifically addressing how the has_been_filtered flag is managed.
has_been_filtered Logic: Ensures the has_been_filtered flag is accurately set to True only when the batch is in target verification mode (self.forward_mode.is_target_verify()).
Prevents Incorrect Filtering: Resolves a bug where has_been_filtered was erroneously True during target extension, which caused incorrect filtering and warnings in chunked prefill operations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes a bug in the speculative decoding logic by conditionally filtering spec_info based on the forward mode. The change introduces a has_been_filtered flag to spec_info.filter_batch, which is correctly set to True only for TARGET_VERIFY mode. This prevents an incorrect filtering of speculative decoding data during other modes like chunked prefill, resolving a warning. The implementation is clear and correct. The switch to using keyword arguments also improves code readability.

fix eagle filter batch

aedd878

ispobock requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners September 27, 2025 15:53

sglang-bot added the run-ci label Sep 27, 2025

Merge branch 'main' into ke/fix-eagle-filter-batch

60ebd71

ispobock assigned zyksir Sep 27, 2025

gemini-code-assist bot reviewed Sep 27, 2025

View reviewed changes

ispobock assigned hnyls2002 Sep 27, 2025

format

5717592

ispobock mentioned this pull request Sep 27, 2025

Fix eagle radix cache #10846

Merged

6 tasks

ispobock added 4 commits September 29, 2025 11:44

fix

c4c7914

Merge branch 'main' into ke/fix-eagle-filter-batch

df2283b

fix

31dd762

fix

c3b810c

ispobock requested a review from kssteven418 as a code owner September 29, 2025 13:14

yikaizhu-baseten approved these changes Sep 30, 2025

View reviewed changes

zyksir approved these changes Sep 30, 2025

View reviewed changes

ispobock merged commit 424591d into main Sep 30, 2025
95 of 115 checks passed

ispobock deleted the ke/fix-eagle-filter-batch branch September 30, 2025 06:44

ch-tiger1 pushed a commit to ch-tiger1/sglang that referenced this pull request Oct 9, 2025

Fix spec filter batch when target extend (sgl-project#10991)

3f48475

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix spec filter batch when target extend #10991

Fix spec filter batch when target extend #10991

ispobock commented Sep 27, 2025

Uh oh!

gemini-code-assist bot commented Sep 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix spec filter batch when target extend #10991

Fix spec filter batch when target extend #10991

Conversation

ispobock commented Sep 27, 2025

Motivation

Uh oh!

gemini-code-assist bot commented Sep 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants