Release v0.6.0 · vllm-project/vllm-spyre

This release:

🎉 Supports embedding models on vLLM v1!
🔥 Removes all remaining support for vLLM v0
⚡ Contains performance and stability fixes for continuous batching
- ⚗️ Support for up to --max-num-seqs 4 --max-model-len 8192 --tensor-parallel-size 4 has been tested on ibm-granite/granite-3.3-8b-instruct
📦 Officially supports vllm 0.9.2 and 0.10.0

What's Changed

[SB] relax constraint on min number of new tokens by @yannicks1 in #322
[CB] bug fix: account for prefill token by @yannicks1 in #320
Documents a bit CB script and tests by @sducouedic in #300
🧪 add long context test by @joerunde in #330
[docs] Add install from PyPI to docs by @ckadner in #327
⬆️ bump base image by @joerunde in #328
[ppc64le] Introduce ppc64le benchmarking scripts by @Daniel-Schenker in #311
[CB] Override number of Spyre blocks: replace env var with top level argument by @yannicks1 in #331
[CB] Add scheduling tests by @sducouedic in #329
🎨 add values in test asserts by @prashantgupta24 in #333
[CB] Refactoring/Cleaning up prepare_prompt/decode by @yannicks1 in #335
feat: enable FP8 quantized models loading by @rafvasq in #316
♻️ Compatibility with vllm main by @prashantgupta24 in #338
V1 embeddings by @maxdebayser in #277
feat: detect CPUs and configure threading sensibly by @tjohnson31415 in #291
[CB] Support pseudo batch size 1 for decode, adjust warmup by @yannicks1 in #287
fix introduced merge conflict on main by @yannicks1 in #345
Add CB API tests on the correct use of max_tokens by @gmarinho2 in #339
♻️ fix vllm:main by @prashantgupta24 in #341
[CB] Optimization: Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching by @yannicks1 in #262
[CI] Tests for graph comparison between vllm and AFTU by @wallashss in #286
[CB] refactoring warmup for batch size 1 by @yannicks1 in #347
[CB][Tests] Check output of scheduling tests on Spyre by @sducouedic in #337
[v1] remove v0 code by @yannicks1 in #344
♻️ enable offline mode in GHA tests by @prashantgupta24 in #349
⬆️ bump base image with more CB fixes by @joerunde in #351
Upstream compatibility tests by @maxdebayser in #343
⬆️ Bump locked vllm to 0.10.0 by @joerunde in #352

New Contributors

@Daniel-Schenker made their first contribution in #311

Full Changelog: v0.5.3...v0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.6.0

What's Changed

New Contributors

Contributors

Uh oh!