[V1][Spec Decode] Implement Eagle Proposer [1/N] #15729

WoosukKwon · 2025-03-28T22:27:07Z

Efficient implementation based upon #15346

NOTE1: I intentionally used a dummy model instead of the real eagle models to reduce the scope of the PR.

NOTE2: Currently, I intentionally ignored all sampling parameters except the temperature when sampling the draft tokens.
The reason being is,

Ignoring the sampling parameter for draft tokens doesn’t change the final outputs after rejection sampling, only affecting the acceptance rate.
Given that the draft model is small, applying top-p, top-k, etc. could be relatively very expensive. I hypothesize that avoiding these expensive ops despite slightly degraded acceptance rate would lead to better end performance.

cc @sroy745 @LiuXiaoxuanPKU

Signed-off-by: Woosuk Kwon <[email protected]>

github-actions · 2025-03-28T22:27:14Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Woosuk Kwon <[email protected]>

vllm/v1/spec_decode/eagle.py

njhill · 2025-03-28T22:53:50Z

vllm/v1/spec_decode/eagle.py

+        logits.div_(sampling_metadata.temperature)
+        probs = logits.softmax(dim=-1, dtype=torch.float32)
+
+        # TODO(woosuk): Consider seeds?


I'm not sure seeds will work with conditional speculation and rejection sampling? :(

Added TODO. I'd like to defer this to a future PR because 1) the sampling code in the PR needs some refactoring anyways, and 2) handling RNG in spec decoding is quite tricky, so I need more time to think more about it.

Signed-off-by: Woosuk Kwon <[email protected]>

vllm/v1/spec_decode/eagle.py

Signed-off-by: Woosuk Kwon <[email protected]>

LiuXiaoxuanPKU

Overall looks good to me, just have some questions. Also, we might want some tests for this PR? Maybe move some from #15346 if possible?

LiuXiaoxuanPKU · 2025-03-31T04:26:56Z

vllm/v1/spec_decode/eagle.py

+        probs = logits.softmax(dim=-1, dtype=torch.float32)
+
+        # TODO(woosuk): Consider seeds?
+        q = torch.empty_like(logits)


At least we need to enable seed for reproducibility? i.e., the draft head proposes the same token when seeded.

Added TODO. I'd like to defer this to a future PR because 1) the sampling code in the PR needs some refactoring anyways, and 2) handling RNG in spec decoding is quite tricky, so I need more time to think more about it.

vllm/v1/spec_decode/eagle.py

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon · 2025-03-31T05:33:58Z

@njhill @LiuXiaoxuanPKU @sroy745 Thanks for the reviews.

Because the integration of EAGLE involves a lot of changes, I tried to break it down to smaller steps. This PR is the first step to establish the high-level design, so I intentionally left many TODOs and FIXMEs.

The next steps should be:

Correctly initializing and loading the EAGLE draft model
Consider the lookahead slots in the KV cache manager
Cache draft_probs inside the model runner and correctly feed it to the rejection sampler in the next step
Handle the edge cases like when the draft model generates beyond max_pos_embeddings
Handle the seeds correctly
Do E2E correctness and performance tests
Support prefix caching. Eagle requires special handling because Eagle's i-th KV cache is coupled with the i+1-th token ID.

I don't think I have bandwidth for all of these, so it'd be nice if we can get some helps from the community.

LiuXiaoxuanPKU · 2025-03-31T05:45:27Z

Sounds good. I am happy to take 2 since I already start thinking about it. Let me create issues for 1,3,4,5,6 and move some of them to onboarding tasks.

WoosukKwon · 2025-03-31T05:49:19Z

@LiuXiaoxuanPKU Can you please take 1 as well?

LiuXiaoxuanPKU · 2025-03-31T16:05:36Z

@LiuXiaoxuanPKU Can you please take 1 as well?

Sure, will give a try today

sroy745

LGTM. One small comment.

vllm/v1/spec_decode/eagle.py

LiuXiaoxuanPKU

LGTM.

mergify · 2025-03-31T16:20:15Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @WoosukKwon.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ekagra-ranjan · 2025-03-31T22:46:37Z

vllm/v1/worker/gpu_model_runner.py

+                    # Common case.
+                    next_token_id = token_ids[-1]
+                else:
+                    # Partial prefill (rare case).


Is this corresponding to this case when we had to backtrack a bit so that we can have ensure we have num_computed_tokens as a multiple of the block size?

Or this handles chunked context?

this is for chunked context, eagle needs the next_token_id as the input. In the decoding phase, next token id is the last generated token id. In the chunked prefill case, it's the next token id of the current chunk.

Signed-off-by: Woosuk Kwon <[email protected]>

vllm/config.py

Signed-off-by: Woosuk Kwon <[email protected]>

v-lmn · 2025-04-03T09:40:10Z

I found a bug when using this code
because self._num_decodes = 0 and decode_metadata is None

@WoosukKwon

ekagra-ranjan · 2025-04-03T14:36:58Z

@v-lmn can you share how you ran this code? My understanding is that this code is not finished and cannot be executed until the futher PRs adding missing functionality in eagle

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Mu Huai <[email protected]>

WoosukKwon added 2 commits March 28, 2025 15:23

Implement Eagle proposer

b148f75

Signed-off-by: Woosuk Kwon <[email protected]>

minor

657d311

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon requested review from robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners March 28, 2025 22:27

mergify bot added the v1 label Mar 28, 2025

WoosukKwon added 2 commits March 28, 2025 15:43

minor

4e2a2d1

Signed-off-by: Woosuk Kwon <[email protected]>

Minor

1b340f2

Signed-off-by: Woosuk Kwon <[email protected]>

njhill reviewed Mar 28, 2025

View reviewed changes

WoosukKwon added 2 commits March 28, 2025 15:58

Minor

382e6d0

Signed-off-by: Woosuk Kwon <[email protected]>

Fix

a4f0438

Signed-off-by: Woosuk Kwon <[email protected]>

sroy745 reviewed Mar 29, 2025

View reviewed changes

vllm/v1/spec_decode/eagle.py Show resolved Hide resolved

sroy745 reviewed Mar 29, 2025

View reviewed changes

vllm/v1/spec_decode/eagle.py Outdated Show resolved Hide resolved

WoosukKwon added 10 commits March 28, 2025 21:37

Merge branch 'main' into woosuk-eagle

892642c

minor

d4b0cf4

Signed-off-by: Woosuk Kwon <[email protected]>

minor

07dfa92

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk-eagle

fd0230e

max_num_tokens

e5e559e

Signed-off-by: Woosuk Kwon <[email protected]>

upstream

4a4bb60

Signed-off-by: Woosuk Kwon <[email protected]>

minor

d8e901a

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk-eagle

e576021

dummy model

83c8b59

Signed-off-by: Woosuk Kwon <[email protected]>

fix

64d2ed7

Signed-off-by: Woosuk Kwon <[email protected]>

LiuXiaoxuanPKU reviewed Mar 31, 2025

View reviewed changes

WoosukKwon added 2 commits March 30, 2025 21:45

Merge branch 'main' into woosuk-eagle

de713fb

Return draft_probs

a7f0600

Signed-off-by: Woosuk Kwon <[email protected]>

sroy745 reviewed Mar 31, 2025

View reviewed changes

vllm/v1/spec_decode/eagle.py Show resolved Hide resolved

LiuXiaoxuanPKU approved these changes Mar 31, 2025

View reviewed changes

mergify bot added needs-rebase and removed needs-rebase labels Mar 31, 2025

ekagra-ranjan reviewed Mar 31, 2025

View reviewed changes

merge

2e734bb

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon force-pushed the woosuk-eagle branch from e79b220 to 2e734bb Compare April 1, 2025 06:36

WoosukKwon changed the title ~~[V1][Spec Decode] Implement Eagle Proposer~~ [V1][Spec Decode] Implement Eagle Proposer [1/N] Apr 1, 2025

simplify

d5db76a

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 1, 2025

DarkLight1337 reviewed Apr 1, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

WoosukKwon added 2 commits April 1, 2025 12:22

Merge branch 'main' into woosuk-eagle

9f16e62

fix

7a1d5ff

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon merged commit e75a630 into main Apr 1, 2025
11 of 15 checks passed

WoosukKwon deleted the woosuk-eagle branch April 1, 2025 19:33

WoosukKwon mentioned this pull request Apr 1, 2025

[SpecDecode] Support EAGLE in V1 #15901

Open

10 tasks

ekagra-ranjan mentioned this pull request Apr 2, 2025

[Bug]: Eagle input triton kernel has bug in index #15908

Closed

Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025

[V1][Spec Decode] Implement Eagle Proposer [1/N] (vllm-project#15729)

1e61ab1

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1][Spec Decode] Implement Eagle Proposer [1/N] (vllm-project#15729)

7d0c77b

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[V1][Spec Decode] Implement Eagle Proposer [1/N] (vllm-project#15729)

5e34301

Signed-off-by: Woosuk Kwon <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1][Spec Decode] Implement Eagle Proposer [1/N] (vllm-project#15729)

b1119cb

Signed-off-by: Woosuk Kwon <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[V1][Spec Decode] Implement Eagle Proposer [1/N] (vllm-project#15729)

0e4d54b

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Uh oh!

[V1][Spec Decode] Implement Eagle Proposer [1/N] #15729

[V1][Spec Decode] Implement Eagle Proposer [1/N] #15729

Uh oh!

Conversation

WoosukKwon commented Mar 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 28, 2025

Uh oh!

Uh oh!

Uh oh!

njhill Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

Uh oh!

LiuXiaoxuanPKU Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

WoosukKwon commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LiuXiaoxuanPKU commented Mar 31, 2025

Uh oh!

WoosukKwon commented Mar 31, 2025

Uh oh!

LiuXiaoxuanPKU commented Mar 31, 2025

Uh oh!

sroy745 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 31, 2025

Uh oh!

ekagra-ranjan Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

ekagra-ranjan Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LiuXiaoxuanPKU Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

v-lmn commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ekagra-ranjan commented Apr 3, 2025

Uh oh!

Uh oh!

WoosukKwon commented Mar 28, 2025 •

edited by github-actions bot

Loading

WoosukKwon commented Mar 31, 2025 •

edited

Loading

ekagra-ranjan Mar 31, 2025 •

edited

Loading

v-lmn commented Apr 3, 2025 •

edited

Loading