Support paged attention for eagle overlap #12

timmy-feng · 2025-08-24T18:03:39Z

Added support for paged attention by doing the following:

Pre-allocate pages in the scheduler thread before calling run_batch. Since we do not know the fill status of the most recent page (it is still running on the GPU), we allocate for the worst case number of pages starting from a new page.
Alter the assign_draft_cache_locs kernel in the draft decode to prepend the remaining unused cache locs from the previous page. We don't have to worry about freeing excess here because the allocator state is restored after draft.
Add a merge_cache_loc kernel to the verify to prepend the remaining unused cache locs from the previous page. We store the excess pages into an evict_cache_loc tensor, which is combined with the other pages that are evicted after accepting tokens.

TODO

Correctness has been achieved for all attention backends other than FA3.

The code is correct when FA3 is used for the draft decode + extend, but not verify.

timmy-feng added 12 commits August 23, 2025 16:44

undo faulty update

9eb686a

allocate draft kv cache speculatively

2d6c9c4

change evict cache loc to gpu tensor

653e9d6

allocate verify kv separately

265c6e9

free verify kv cache

e42086b

fix last_loc

3d00118

add page size to benchmarks

e66b524

more fixes

b097f15

scheduler cache free fix

88cbdb7

fix merge_cache_loc bug

28089eb

fix assign_draft

0fe1bee

avoid random write addresses

804455e

timmy-feng mentioned this pull request Aug 25, 2025

Support overlap scheduling for speculative decoding sgl-project/sglang#9588

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support paged attention for eagle overlap #12

Support paged attention for eagle overlap #12

Uh oh!

timmy-feng commented Aug 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support paged attention for eagle overlap #12

Are you sure you want to change the base?

Support paged attention for eagle overlap #12

Uh oh!

Conversation

timmy-feng commented Aug 24, 2025

TODO

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants