Introduce support for dynamic batching #1662

baijumeswani · 2025-07-29T17:31:41Z

The changes in #1580 introduce a new API in onnxruntime-genai that enables continuous submission of requests to a generation engine. Internally, the scheduler statically batches these requests, allowing the API to interface seamlessly with models designed for static batching.

This pull request lays the groundwork for dynamic request batching by incorporating concepts from vLLM. Specifically, it implements a block-based key-value cache manager.

To support dynamic batching, the underlying model must handle variable-length inputs (as opposed to fixed-size static batches) and leverage the PagedAttention contrib-op.

…into baijumeswani/engine

baijumeswani and others added 15 commits July 1, 2025 16:20

Introduce the Engine to support continuous batching

eedc59e

Address pipeline failures

463cf13

static cast size_t to int

9a181e0

Continuous batching python example

e7c12b0

Cleanup

41cff2e

Check for non empty ready requests

cd9a151

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

475b9e6

…into baijumeswani/engine

Support cpu models

28fd232

Lint error

914fd8c

Support webgpu models

6282887

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

0792bfc

…into baijumeswani/engine

Address pull request review comments

0b85594

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

fb7ee06

…into baijumeswani/engine

Introduce support for dynamic batching

34f7f75

Fix windows pipelines

f3a7d7a

Base automatically changed from baijumeswani/engine to main August 4, 2025 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce support for dynamic batching #1662

Introduce support for dynamic batching #1662

Uh oh!

baijumeswani commented Jul 29, 2025

Uh oh!

Uh oh!

Introduce support for dynamic batching #1662

Are you sure you want to change the base?

Introduce support for dynamic batching #1662

Uh oh!

Conversation

baijumeswani commented Jul 29, 2025

Uh oh!

Uh oh!