Skip to content

Conversation

ispobock
Copy link
Collaborator

@ispobock ispobock commented Sep 24, 2025

Motivation

python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --speculative-algo EAGLE3 \
    --speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B  \
    --speculative-num-steps 2 --speculative-eagle-topk 1 --speculative-num-draft-tokens 3 \
    --dtype float16
    
cd benchmark/mtbench
python3 bench_sglang_eagle.py --parallel 1 --num-questions 10

main:

w/o radix cache:
#questions: 10, Throughput: 269.86 token/s, Acceptance length: 2.39

w/ radix cache:
#questions: 10, Throughput: 255.74 token/s, Acceptance length: 2.26

this PR:

w/o radix cache:
#questions: 10, Throughput: 269.45 token/s, Acceptance length: 2.39

w/ radix cache:
#questions: 10, Throughput: 270.85 token/s, Acceptance length: 2.39

compatibility test:

  • page size 16
  • page size 16 + chunked prefill 64
  • page size 2 + chunked prefill 64
  • page size 1 + chunked prefill 64
  • HiCache (this fix is not adapted for HiCache, but doesn't break current HiCache)
  • multi prompts in one batch (chunked prefill corner case, ref: Fix spec filter batch when target extend  #10991)

Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Collaborator

@JustinTong0323 JustinTong0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@ispobock ispobock changed the title Fix eagle radix cache [Do not merge] Fix eagle radix cache Sep 24, 2025
@ispobock
Copy link
Collaborator Author

The page_size>1 still have some issue, will fix it later.

@xiezhq-hermann xiezhq-hermann self-assigned this Sep 25, 2025
@ispobock ispobock changed the title [Do not merge] Fix eagle radix cache Fix eagle radix cache Sep 26, 2025
@ispobock
Copy link
Collaborator Author

@xiezhq-hermann @merrymercy This PR is ready for review.

if value is None:
value = torch.tensor(key.token_ids, dtype=torch.int64)

if self.is_eagle:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hiradix (and other trees like swa) override the insert function, would that be a problem since eagle worker shared the same tree?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in current design, we need to adapt this change to other trees like swa and hiradix if they override these functions. This PR just makes the main radix tree ready. HiCache and swa need extra work and test to make them ready.


return self._insert_helper(self.root_node, key, value)

def cache_finished_req(self, req: Req):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while hiradix does not, swa tree override this implementation as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The swa cache inherits from BaseRadixCache, so it seems all the changes should be implemented again on it. HiCache is from RadixCache, we just need to do some adaptation on it with less override. But for HiCache, the main thing I'm concerning is that the chunked prefill size is a little changed. If the chunked prefill size is 64, actually only 63 bigram keys are inserted to the tree. Maybe it's not efficient for cache offloading with block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is what we are doing is primarily to resolve conflict with eagle workers since it shares the same radix tree but has its own pool, but not to have hicache support for eagle workers, i.e., eagle workers to fetch kv caches from host memory, which seems unnecessary and potentially complicated. Is it correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the kv cache for eagle worker is unnecessary to store into host memory since it's only one layer. If we use HiCache only for target model, can we still share the kv indices between target and draft pool?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think it should be fine just wanted to confirm that we are aligned on this

Copy link
Collaborator Author

@ispobock ispobock Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ispobock ispobock merged commit 91847e3 into main Sep 30, 2025
124 of 150 checks passed
@ispobock ispobock deleted the ke/eagle-radix-cache branch September 30, 2025 15:00
PrinsYin pushed a commit to PrinsYin/sglang that referenced this pull request Oct 7, 2025
ch-tiger1 pushed a commit to ch-tiger1/sglang that referenced this pull request Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants