[CB] Support pseudo batch size 1 for decode, adjust warmup #287

yannicks1 · 2025-07-08T08:05:04Z

[CB] enable warmup for batch size 1

in #285 we wanted to allow batch size 1 for continuous batching. However, the warmup did not support batch size 1. With this small fix it does work on CPU at least. It does not currently compile on the card. Would need to compare graphs next...

Signed-off-by: Joe Runde <[email protected]>

Signed-off-by: Yannick Schnider <[email protected]>

github-actions · 2025-07-08T08:14:43Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-07-08T13:32:34Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

prashantgupta24 · 2025-07-08T16:24:39Z

the warmup did not support batch size 1

naive question - can we add a failing test for that which this PR fixes?

Edit: I guess we had the assert earlier which prevented batch size 1 from running 🤷

joerunde · 2025-07-08T17:22:15Z

ah interesting- I swear I had tested manually on a dev pod and saw batch size 1 working, but I dunno how that was happening when the test clearly failed 🤦.

Anyway nice fix!

joerunde · 2025-07-08T17:28:55Z

vllm_spyre/v1/worker/spyre_worker.py

@@ -317,6 +318,18 @@ def _warmup_spyre_dynamic_size(self, special_token_ids):
        prompt_len = 42
        num_decode_tokens = 2

+        # Fix for batch size 1: set input batch to fit 2 requests for warmup
+        if model_runner.vllm_config.scheduler_config.max_num_seqs == 1:
+            model_runner.input_batch = InputBatch(


Alternatively, could the InputBatch construct itself with:

self.max_num_reqs = min(max_num_reqs, 2)

since we know that it'll always need at least 2, and then we avoid reconstructing it in the worker here? That way we have a much smaller diff to back out once we can lift this bs>=2 restriction

not sure if I follow here. it has to be >=2 for the warmup. with the min(1,2) we would still fail?

would that work if you directly set model_runner.input_batch.max_num_reqs = 2, instead of instantiating a new InputBatch?

no, because InputBatch initialization gets model_runner.input_batch.max_num_reqs..

max_num_seqs occurs 17 times in the init of the InputBatch. It is not a single attribute, but used to construct several attributes. So re-initializing is simpler...

yannicks1 · 2025-07-08T18:35:06Z

I tried to test this on the card, but it failed. Not clear to me why though. Will have to compare graphs tomorrow.

yannicks1 · 2025-07-10T13:42:55Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

waleedqk · 2025-07-15T18:30:30Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-07-18T09:33:05Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

Signed-off-by: Yannick Schnider <[email protected]>

JRosenkranz · 2025-07-18T11:40:41Z

I tried to test this on the card, but it failed. Not clear to me why though. Will have to compare graphs tomorrow.

@yannicks1 in order to support this, we will need to add the following which will enable symbolic shapes for size 1 dimensions (if marked dynamic):

from torch.fx.experimental import _config as config
config.backed_size_oblivious = True

yannicks1 · 2025-07-18T12:05:30Z

thanks for the hint @JRosenkranz ! I think though your suggested changes should be tried with #312 which actually uses torch 2.7.1 and applies real batch size 1. This PR (#287) is doing padding under the hood. It is merely supposed to support --max_num_seqs 1 from a scheduling perspective...

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-07-24T11:45:25Z

bot:test
TEST_FILE=tests/e2e/test_spyre_basic.py MARKERS="spyre"

yannicks1 · 2025-07-24T11:48:37Z

I found the culprit and fixed it here. Tested manually on the card and works now! Ready for review!

vllm_spyre/v1/worker/spyre_worker.py

prashantgupta24

Can we not wait for #312 instead?

yannicks1 · 2025-07-25T06:54:45Z

@prashantgupta24 We don't know when the support for 'true' batch size 1 for decode will be established. It requires some changes lower in the stack. However, the request for batch size 1 is here and with this workaround we are able to support it. While I agree that #312 cleans up the logic and is nicer from a code point, please note that the performance benefit of batch size 1 over batch size 2 is marginal. As the compiler team has more pressing issues, #312 has been de-prioritized and we want to merge this instead...

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-07-25T18:24:29Z

I removed the test case max_num_seqs=1 here to save some runs. Can we please get that approved and into main @maxdebayser @prashantgupta24 @wallashss

prashantgupta24 · 2025-07-25T20:15:54Z

I'll try to take a look at this after lunch!

yannicks1 · 2025-07-29T09:01:42Z

@wallashss @joerunde @prashantgupta24 @maxdebayser can we please get this in...

maxdebayser

Sorry, I somehow missed the comment where you had already answered about the re-initialization of the input batch. That was my last remaining question I think.

)" This reverts commit fb8011a.

joerunde and others added 3 commits July 7, 2025 15:33

🧪 add test for cb @ bs 1

a52c7ec

Signed-off-by: Joe Runde <[email protected]>

🐛 update max batch size to 2 for cb abort test

1797ea3

Signed-off-by: Joe Runde <[email protected]>

remove assertion batch size >= 2 (for warmup)

9561a9d

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 added 2 commits July 8, 2025 13:17

fix batch size 1 warmup

10aed51

Signed-off-by: Yannick Schnider <[email protected]>

sorting imports

78d864a

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 changed the title ~~remove assertion batch size >= 2 (for warmup)~~ [CB] enable warmup for batch size 1 Jul 8, 2025

yannicks1 marked this pull request as ready for review July 8, 2025 14:12

yannicks1 requested review from tdoublep, nikolaospapandreou and sducouedic as code owners July 8, 2025 14:12

yannicks1 requested a review from joerunde July 8, 2025 14:12

joerunde reviewed Jul 8, 2025

View reviewed changes

yannicks1 self-assigned this Jul 9, 2025

Merge branch 'main' into cb-batch-size-1-test

ff7b32a

yannicks1 requested review from rafvasq and prashantgupta24 as code owners July 9, 2025 15:21

Base automatically changed from cb-batch-size-1 to main July 9, 2025 19:23

Merge branch 'main' into cb-batch-size-1-test

55f4efa

Merge branch 'main' into cb-batch-size-1-test

b392e23

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 mentioned this pull request Jul 16, 2025

[CB] Support batch size 1 for decode, simplify warmup #312

Merged

Merge branch 'main' into cb-batch-size-1-test

3959678

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 changed the title ~~[CB] enable warmup for batch size 1~~ [CB][do not merge] enable warmup for batch size 1 Jul 18, 2025

add test for max_num_seqs = 1

d55a23f

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 marked this pull request as draft July 18, 2025 12:00

yannicks1 added 2 commits July 24, 2025 13:06

Merge branch 'main' into cb-batch-size-1-test

b2c96dd

fix! setting VLLM_DT_MAX_BATCH_SIZE env for compiler to 2 !

c8ebad6

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 changed the title ~~[CB][do not merge] enable warmup for batch size 1~~ [CB] Support pseudo batch size 1 for decode, adjust warmup Jul 24, 2025

yannicks1 marked this pull request as ready for review July 24, 2025 11:51

yannicks1 enabled auto-merge (squash) July 24, 2025 11:54

github-actions bot added the ready label Jul 24, 2025

maxdebayser reviewed Jul 24, 2025

View reviewed changes

vllm_spyre/v1/worker/spyre_worker.py Show resolved Hide resolved

prashantgupta24 reviewed Jul 24, 2025

View reviewed changes

yannicks1 added 2 commits July 25, 2025 18:20

Merge branch 'main' into cb-batch-size-1-test

c423480

revert adding test to reduce number of runs

9a2c516

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 requested a review from wallashss July 26, 2025 08:57

maxdebayser approved these changes Jul 29, 2025

View reviewed changes

yannicks1 merged commit fb8011a into main Jul 29, 2025
15 of 18 checks passed

yannicks1 deleted the cb-batch-size-1-test branch July 29, 2025 13:56

joerunde added a commit that referenced this pull request Jul 29, 2025

Revert "[CB] Support pseudo batch size 1 for decode, adjust warmup (#287

0769459

)" This reverts commit fb8011a.

[CB] Support pseudo batch size 1 for decode, adjust warmup #287

[CB] Support pseudo batch size 1 for decode, adjust warmup #287

Uh oh!

Conversation

yannicks1 commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[CB] enable warmup for batch size 1

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

yannicks1 commented Jul 8, 2025

Uh oh!

prashantgupta24 commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joerunde commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joerunde Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

yannicks1 Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

sducouedic Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

yannicks1 Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

yannicks1 Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

yannicks1 commented Jul 8, 2025

Uh oh!

yannicks1 commented Jul 10, 2025

Uh oh!

waleedqk commented Jul 15, 2025

Uh oh!

yannicks1 commented Jul 18, 2025

Uh oh!

JRosenkranz commented Jul 18, 2025

Uh oh!

yannicks1 commented Jul 18, 2025

Uh oh!

yannicks1 commented Jul 24, 2025

Uh oh!

yannicks1 commented Jul 24, 2025

Uh oh!

Uh oh!

prashantgupta24 left a comment

Choose a reason for hiding this comment

Uh oh!

yannicks1 commented Jul 25, 2025

Uh oh!

yannicks1 commented Jul 25, 2025

Uh oh!

prashantgupta24 commented Jul 25, 2025

Uh oh!

yannicks1 commented Jul 29, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yannicks1 commented Jul 8, 2025 •

edited

Loading

prashantgupta24 commented Jul 8, 2025 •

edited

Loading

joerunde commented Jul 8, 2025 •

edited

Loading