[CB] bug fix: account for prefill token #320

yannicks1 · 2025-07-17T09:26:55Z

[CB] bug fix: account for prefill token when asserting context length

Prefill already provides one new token (without requiring any KV cache for it).

Example: for max model length 2048 it is possible to do a prefill on a prompt of size 2048 to (32 blocks in Spyre) when only requesting 1 token. A 33rd block is only needed if a 2nd output token was requested and that would violate the max model length.

changes:

correct assertion in platform.py: allow any requests that satisfy: prompt_padding_len + max_tokens - 1 <= max_model_len
correct long_context.py: allowing prompt_len <= max_model_len while setting
new_tokens = max_model_len + 1 - prompt_padding_len

Signed-off-by: Yannick Schnider <[email protected]>

github-actions · 2025-07-17T09:27:05Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

sducouedic · 2025-07-18T14:47:04Z

examples/offline_inference/long_context.py

@@ -122,7 +122,7 @@ def round_up(t):


 tokens_to_generate = [
-    args.max_model_len - round_up(plen) for plen in prompt_lens
+    args.max_model_len + 1 - round_up(plen) for plen in prompt_lens


I cannot help but read plen as if it was one word, can we rename that?

Suggested change

args.max_model_len + 1 - round_up(plen) for plen in prompt_lens

args.max_model_len + 1 - round_up(p_len) for p_len in prompt_lens

or

Suggested change

args.max_model_len + 1 - round_up(plen) for plen in prompt_lens

args.max_model_len + 1 - round_up(prompt_len) for prompt_lenin prompt_lens

+1 for prompt_len, we don't need to save the bytes

prompt_lenin

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 added 2 commits July 17, 2025 09:19

account for prefill token

58b6de8

Signed-off-by: Yannick Schnider <[email protected]>

adapt script long_context.py

07ec941

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 marked this pull request as ready for review July 17, 2025 11:02

yannicks1 requested review from prashantgupta24, sducouedic, tdoublep and nikolaospapandreou as code owners July 17, 2025 11:02

yannicks1 requested review from maxdebayser and joerunde July 17, 2025 11:03

sducouedic approved these changes Jul 18, 2025

View reviewed changes

address feedback: renaming var

283d752

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 enabled auto-merge (squash) July 21, 2025 06:54

github-actions bot added the ready label Jul 21, 2025

yannicks1 merged commit 1d13d62 into main Jul 21, 2025
19 checks passed

yannicks1 deleted the ysc-fix-prefill-token branch July 21, 2025 07:17

yannicks1 mentioned this pull request Jul 24, 2025

[CB] fix scheduler constraint max tokens #332

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CB] bug fix: account for prefill token #320

[CB] bug fix: account for prefill token #320

Uh oh!

yannicks1 commented Jul 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

sducouedic Jul 18, 2025

Uh oh!

joerunde Jul 18, 2025

Uh oh!

joerunde Jul 18, 2025

Uh oh!

Uh oh!

Uh oh!

	args.max_model_len + 1 - round_up(plen) for plen in prompt_lens
	args.max_model_len + 1 - round_up(p_len) for p_len in prompt_lens

	args.max_model_len + 1 - round_up(plen) for plen in prompt_lens
	args.max_model_len + 1 - round_up(prompt_len) for prompt_lenin prompt_lens

[CB] bug fix: account for prefill token #320

[CB] bug fix: account for prefill token #320

Uh oh!

Conversation

yannicks1 commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[CB] bug fix: account for prefill token when asserting context length

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

sducouedic Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yannicks1 commented Jul 17, 2025 •

edited

Loading