V1 embeddings #277

maxdebayser · 2025-07-02T18:35:09Z

Description

This PR enables embedding models on vllm V1. In contrast with the V1 GPU implementation, here I added a separate model runner because for most of the embedding models there is no need for continuous batching. To avoid code repetition, I refactored the input batch and model runner classes into a class hierarchy with common base classes.

@gmarinho2 contributed a test that verifies that the returned embeddings don't change with batch size.

Signed-off-by: Max de Bayser <[email protected]>

github-actions · 2025-07-02T18:35:17Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Max de Bayser <[email protected]>

The changes introduced by PR vllm-project/vllm#16728 to the sampler architecture were incompatible with our spyre model runner. Initially, as a stopgap solution. I copied the old sampling classes into our vllm_spyre tree just so that we can keep working on the latest changes from main. Now this commit reverts that and makes the same logits processor logic work for the spyre input batch and model runner classes. The difference with the gpu model runner is that in spyre we don't condense the batch but have a boolean mask that is used to calculate "dense" request indices. These indices must be used for the BatchUpdateBuilder because they are the right ones to slice the `logits` tensor that is passed to the Sampler. Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

…upstream (#245)" This reverts commit 962abf1. Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Gabriel Marinho <[email protected]> Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

tests/e2e/test_spyre_embeddings.py

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser · 2025-07-22T02:51:44Z

All tests are passing now after the changes from the first round of reviews.

vllm_spyre/v1/worker/spyre_model_runner.py

Signed-off-by: Max de Bayser <[email protected]>

yannicks1 · 2025-07-28T09:20:15Z

From my side this looks good! I have one small things apart from the tiny merge conflicts @maxdebayser will solve in within seconds:)

IMO we have 1 or 2 files too many in vllm_spyre/v1/worker. It looks cluttered with the three input batch related .py files. Upstream they only have one file for input batches per accelerator (e.g. GPU, TPU). Looking at the number of lines in e.g. the gpu_input_batch.py (760 lines) we could put all the content of vllm_spyre/v1/worker/spyre_base_input_batch.py, vllm_spyre/v1/worker/spyre_input_batch.py and vllm_spyre/v1/worker/spyre_pooling_input_batch.py into one file (called spyre_input_batch.py) and have around 800 lines. Another option would be to put the base class vllm_spyre/v1/worker/spyre_base_input_batch.py into vllm_spyre/v1/worker/spyre_input_batch.py, similar as we have the model runner base class in vllm_spyre/v1/worker/spyre_model_runner.py and not a separate file for this.

yannicks1 · 2025-07-28T09:21:25Z

Also hoping for @joerunde to give this a final pass (as it is a really big refactoring) and merge once he is back:)

wallashss · 2025-07-28T13:20:41Z

I think it should be nice to update docs/supported_features.md

wallashss · 2025-07-28T15:08:48Z

vllm_spyre/v1/worker/spyre_pooling_input_batch.py

+                torch.Tensor, self.token_type_ids_cpu_tensor).numpy()
+        return self._token_type_ids_cpu
+
+    def has_token_types(self) -> bool:


I'm pretty sure that this method is not being used and, I was wondering if the tensors related to that like token_type_ids_cpu_tensor are being used as well. I mean, I could find it being populated, but not being read if I did the reading correctly.

Yes, this is to prepare for changes that haven't been merged upstream yet. I can remove these to simplify the PR for now.

It's not clear in what shape the support will be added upstream Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

wallashss

LGTM! Thanks for the refactoring of input batch, it makes sense to me.

Just a friendly reminder: update docs/supported_features.md

Signed-off-by: Max de Bayser <[email protected]>

### [v1] remove v0 code Now as we have v1 support for embedding models (#277 ), we can finally delete the v0 code. Note: for decoder models v0 support was depreciated some time ago. --------- Signed-off-by: Yannick Schnider <[email protected]>

maxdebayser added 4 commits May 29, 2025 14:35

Solve conflicts with upstream embedding branch

7efcc95

Signed-off-by: Max de Bayser <[email protected]>

refactor input batch

370ebcd

Signed-off-by: Max de Bayser <[email protected]>

add spyre pooling batch

4de58bc

Signed-off-by: Max de Bayser <[email protected]>

initial model runner prototype

94b6fe2

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser and others added 21 commits July 2, 2025 15:56

Merge branch 'main' into v1_embeddings

313a20a

Signed-off-by: Max de Bayser <[email protected]>

fix linting

ce50b01

Signed-off-by: Max de Bayser <[email protected]>

appease isort

cc5d996

Signed-off-by: Max de Bayser <[email protected]>

Merge branch 'logits_processors' into v1_embeddings

6552783

Signed-off-by: Max de Bayser <[email protected]>

Remove attn_type from spec as this change hasn't made it upstream yet

bff271d

Signed-off-by: Max de Bayser <[email protected]>

Revert "[Priority merge] NewRequestData parameter introduced in vllm …

49effcc

…upstream (#245)" This reverts commit 962abf1. Signed-off-by: Max de Bayser <[email protected]>

disable token type ids for now

b0d08d4

Signed-off-by: Max de Bayser <[email protected]>

linting

9ac34b1

Signed-off-by: Max de Bayser <[email protected]>

fix masking

d391a28

Signed-off-by: Max de Bayser <[email protected]>

add missing arg

8f1f12c

Signed-off-by: Max de Bayser <[email protected]>

fix off by one error

8d3c65b

Signed-off-by: Max de Bayser <[email protected]>

finish most of the model runner refactoring

1ee5a6f

Signed-off-by: Max de Bayser <[email protected]>

small fixes

8665f5f

Signed-off-by: Max de Bayser <[email protected]>

Merge branch 'main' into v1_embeddings

ede0080

Signed-off-by: Max de Bayser <[email protected]>

add embedding tests for multiple requests

165917a

Signed-off-by: Gabriel Marinho <[email protected]> Signed-off-by: Max de Bayser <[email protected]>

Fix test typo and monkey patch Bert model support

3f1123a

Signed-off-by: Max de Bayser <[email protected]>

Merge branch 'main' into v1_embeddings

dc54b8c

fix assertion

fb98ef2

Signed-off-by: Max de Bayser <[email protected]>

fix _get_token_ids

4532431

Signed-off-by: Max de Bayser <[email protected]>

fix mistakes

4510bde

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser changed the title ~~[WIP] V1 embeddings~~ V1 embeddings Jul 15, 2025

maxdebayser marked this pull request as ready for review July 15, 2025 22:53

maxdebayser requested review from rafvasq and prashantgupta24 as code owners July 15, 2025 22:53

restore chili peppers

5f46a53

Signed-off-by: Max de Bayser <[email protected]>

sducouedic reviewed Jul 21, 2025

View reviewed changes

maxdebayser added 7 commits July 21, 2025 10:17

fix tests

c047171

Signed-off-by: Max de Bayser <[email protected]>

fix missing torch_sendnn initialization

2e8c25c

Signed-off-by: Max de Bayser <[email protected]>

support upstream changes

9a055c3

Signed-off-by: Max de Bayser <[email protected]>

revert edit mistake

5018399

Signed-off-by: Max de Bayser <[email protected]>

appease mypy

c8e2db7

Signed-off-by: Max de Bayser <[email protected]>

work around upstream changes

204241c

Signed-off-by: Max de Bayser <[email protected]>

compatibility with vllm 0.9.3

f04dd49

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser enabled auto-merge (squash) July 22, 2025 02:50

github-actions bot added the ready label Jul 22, 2025

prashantgupta24 reviewed Jul 25, 2025

View reviewed changes

vllm_spyre/v1/worker/spyre_model_runner.py Outdated Show resolved Hide resolved

maxdebayser added 2 commits July 25, 2025 16:26

Merge branch 'main' into v1_embeddings

aa93ebd

Signed-off-by: Max de Bayser <[email protected]>

fix merge problem

7b19f99

Signed-off-by: Max de Bayser <[email protected]>

wallashss reviewed Jul 28, 2025

View reviewed changes

maxdebayser added 4 commits July 28, 2025 14:33

Remove token_type_ids

cca9e4c

It's not clear in what shape the support will be added upstream Signed-off-by: Max de Bayser <[email protected]>

update supported features

81e1134

Signed-off-by: Max de Bayser <[email protected]>

consolidate input batch

c3dc70a

Signed-off-by: Max de Bayser <[email protected]>

consolidate model runners

85d54f7

Signed-off-by: Max de Bayser <[email protected]>

wallashss approved these changes Jul 28, 2025

View reviewed changes

Merge branch 'main' into v1_embeddings

79b5069

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser merged commit 02f4e63 into main Jul 28, 2025
15 of 18 checks passed

maxdebayser deleted the v1_embeddings branch July 28, 2025 18:58

yannicks1 mentioned this pull request Jul 29, 2025

[v1] remove v0 code #344

Merged

maxdebayser mentioned this pull request Jul 29, 2025

[V1] embedding model runner #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

V1 embeddings #277

V1 embeddings #277

Uh oh!

maxdebayser commented Jul 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxdebayser commented Jul 22, 2025

Uh oh!

Uh oh!

yannicks1 commented Jul 28, 2025

Uh oh!

yannicks1 commented Jul 28, 2025

Uh oh!

wallashss commented Jul 28, 2025

Uh oh!

wallashss Jul 28, 2025

Uh oh!

maxdebayser Jul 28, 2025

Uh oh!

wallashss left a comment

Uh oh!

Uh oh!

Uh oh!

V1 embeddings #277

V1 embeddings #277

Uh oh!

Conversation

maxdebayser commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxdebayser commented Jul 22, 2025

Uh oh!

Uh oh!

yannicks1 commented Jul 28, 2025

Uh oh!

yannicks1 commented Jul 28, 2025

Uh oh!

wallashss commented Jul 28, 2025

Uh oh!

wallashss Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

maxdebayser Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

wallashss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maxdebayser commented Jul 2, 2025 •

edited

Loading