Skip to content

[CB] Support pseudo batch size 1 for decode, adjust warmup #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 29, 2025

Conversation

yannicks1
Copy link
Collaborator

@yannicks1 yannicks1 commented Jul 8, 2025

[CB] enable warmup for batch size 1

in #285 we wanted to allow batch size 1 for continuous batching. However, the warmup did not support batch size 1. With this small fix it does work on CPU at least. It does not currently compile on the card. Would need to compare graphs next...

Copy link

github-actions bot commented Jul 8, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

yannicks1 added 2 commits July 8, 2025 13:17
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

@yannicks1 yannicks1 changed the title remove assertion batch size >= 2 (for warmup) [CB] enable warmup for batch size 1 Jul 8, 2025
@yannicks1 yannicks1 marked this pull request as ready for review July 8, 2025 14:12
@yannicks1 yannicks1 requested a review from joerunde July 8, 2025 14:12
@prashantgupta24
Copy link
Collaborator

prashantgupta24 commented Jul 8, 2025

the warmup did not support batch size 1

naive question - can we add a failing test for that which this PR fixes?

Edit: I guess we had the assert earlier which prevented batch size 1 from running 🤷

@joerunde
Copy link
Collaborator

joerunde commented Jul 8, 2025

ah interesting- I swear I had tested manually on a dev pod and saw batch size 1 working, but I dunno how that was happening when the test clearly failed 🤦.

Anyway nice fix!

@@ -317,6 +318,18 @@ def _warmup_spyre_dynamic_size(self, special_token_ids):
prompt_len = 42
num_decode_tokens = 2

# Fix for batch size 1: set input batch to fit 2 requests for warmup
if model_runner.vllm_config.scheduler_config.max_num_seqs == 1:
model_runner.input_batch = InputBatch(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could the InputBatch construct itself with:

self.max_num_reqs = min(max_num_reqs, 2)

since we know that it'll always need at least 2, and then we avoid reconstructing it in the worker here? That way we have a much smaller diff to back out once we can lift this bs>=2 restriction

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I follow here. it has to be >=2 for the warmup. with the min(1,2) we would still fail?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that work if you directly set model_runner.input_batch.max_num_reqs = 2, instead of instantiating a new InputBatch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because InputBatch initialization gets model_runner.input_batch.max_num_reqs..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_num_seqs occurs 17 times in the init of the InputBatch. It is not a single attribute, but used to construct several attributes. So re-initializing is simpler...

@yannicks1
Copy link
Collaborator Author

I tried to test this on the card, but it failed. Not clear to me why though. Will have to compare graphs tomorrow.

@yannicks1 yannicks1 self-assigned this Jul 9, 2025
Base automatically changed from cb-batch-size-1 to main July 9, 2025 19:23
@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

1 similar comment
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

@yannicks1 yannicks1 changed the title [CB] enable warmup for batch size 1 [CB][do not merge] enable warmup for batch size 1 Jul 18, 2025
Signed-off-by: Yannick Schnider <[email protected]>
@JRosenkranz
Copy link
Collaborator

I tried to test this on the card, but it failed. Not clear to me why though. Will have to compare graphs tomorrow.

@yannicks1 in order to support this, we will need to add the following which will enable symbolic shapes for size 1 dimensions (if marked dynamic):

from torch.fx.experimental import _config as config
config.backed_size_oblivious = True

@yannicks1 yannicks1 marked this pull request as draft July 18, 2025 12:00
@yannicks1
Copy link
Collaborator Author

thanks for the hint @JRosenkranz ! I think though your suggested changes should be tried with #312 which actually uses torch 2.7.1 and applies real batch size 1. This PR (#287) is doing padding under the hood. It is merely supposed to support --max_num_seqs 1 from a scheduling perspective...

@yannicks1 yannicks1 changed the title [CB][do not merge] enable warmup for batch size 1 [CB] Support pseudo batch size 1 for decode, adjust warmup Jul 24, 2025
@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_basic.py MARKERS="spyre"

@yannicks1
Copy link
Collaborator Author

I found the culprit and fixed it here. Tested manually on the card and works now! Ready for review!

@yannicks1 yannicks1 marked this pull request as ready for review July 24, 2025 11:51
@yannicks1 yannicks1 enabled auto-merge (squash) July 24, 2025 11:54
@github-actions github-actions bot added the ready label Jul 24, 2025
Copy link
Collaborator

@prashantgupta24 prashantgupta24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not wait for #312 instead?

@yannicks1
Copy link
Collaborator Author

@prashantgupta24 We don't know when the support for 'true' batch size 1 for decode will be established. It requires some changes lower in the stack. However, the request for batch size 1 is here and with this workaround we are able to support it. While I agree that #312 cleans up the logic and is nicer from a code point, please note that the performance benefit of batch size 1 over batch size 2 is marginal. As the compiler team has more pressing issues, #312 has been de-prioritized and we want to merge this instead...

@yannicks1
Copy link
Collaborator Author

I removed the test case max_num_seqs=1 here to save some runs. Can we please get that approved and into main @maxdebayser @prashantgupta24 @wallashss

@prashantgupta24
Copy link
Collaborator

I'll try to take a look at this after lunch!

@yannicks1 yannicks1 requested a review from wallashss July 26, 2025 08:57
@yannicks1
Copy link
Collaborator Author

@wallashss @joerunde @prashantgupta24 @maxdebayser can we please get this in...

Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I somehow missed the comment where you had already answered about the re-initialization of the input batch. That was my last remaining question I think.

@yannicks1 yannicks1 merged commit fb8011a into main Jul 29, 2025
15 of 18 checks passed
@yannicks1 yannicks1 deleted the cb-batch-size-1-test branch July 29, 2025 13:56
joerunde added a commit that referenced this pull request Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants