Skip to content

[P/D] NIXL Integration #17751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 100 commits into from
May 12, 2025
Merged

Conversation

robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented May 6, 2025

SUMMARY:

  • support "dynamo-style" direct KV cache transfer
  • support fully async send/recv
  • support runtime NIXL handshake
  • support xPyD
  • support homogeneous TP > 1
  • support P->D request flow

FOLLOW UPS:

  • support D->P request flow (dynamo-style request flow)
  • support Heterogeneous TP
  • support DP attention
  • robustness to failures
  • consider more edge cases (prompt logprobs, parallel sampling)
  • local attention

tlrmchlsmth and others added 13 commits May 3, 2025 11:13
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <[email protected]>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <[email protected]>

* [add] extra information about evns

Signed-off-by: ApostaC <[email protected]>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Updates

Signed-off-by: Tyler Michael Smith <[email protected]>

* Rs branch (#3)

* updated

Signed-off-by: [email protected] <[email protected]>

* Rs branch (#5)

Signed-off-by: [email protected] <[email protected]>

* Remove Unneeded Arguments (#7)

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* cleanup

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Improve disagg-example.sh (#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* added connector

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* update

Signed-off-by: [email protected] <[email protected]>

* remove

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* seems to load properly

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* added

Signed-off-by: [email protected] <[email protected]>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* updaed

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* WIP

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated on scheduler side

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Hacking away

Signed-off-by: Tyler Michael Smith <[email protected]>

* cleanup

Signed-off-by: Robert Shaw <[email protected]>

* ensure request removed from running list

Signed-off-by: Robert Shaw <[email protected]>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* rename files

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <[email protected]>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <[email protected]>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* justfile edits

Signed-off-by: Tyler Michael Smith <[email protected]>

* Update

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <[email protected]>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <[email protected]>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated (#12)

Signed-off-by: [email protected] <[email protected]>

* Add Accuracy Test (#13)

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Preemption Bugfixes (#15)

* stash fixed double free issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* fixed issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated (#16)

Signed-off-by: [email protected] <[email protected]>

* Fix Bad Merge | Fix Memory Leak in Upstream (#18)

* updated

Signed-off-by: [email protected] <[email protected]>

* fix merge

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* clean up justfile, examples

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* More cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup, precommit fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

* More cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* run_accuracy_test.sh UX

Signed-off-by: Tyler Michael Smith <[email protected]>

* squash warnings

Signed-off-by: Tyler Michael Smith <[email protected]>

* pre-commit

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* Add get_finished to base kv connector

Signed-off-by: mgoin <[email protected]>

* revert test.txt

Signed-off-by: Tyler Michael Smith <[email protected]>

* cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* Cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* review comments

Signed-off-by: Tyler Michael Smith <[email protected]>

---------

Signed-off-by: ApostaC <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: mgoin <[email protected]>
Co-authored-by: ApostaC <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: mgoin <[email protected]>
* updated

Signed-off-by: [email protected] <[email protected]>

* mypy

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* update typing

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>
* [V1] Support multiple kv connectors

Signed-off-by: mgoin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>

* Example script

Signed-off-by: mgoin <[email protected]>

* .

Signed-off-by: mgoin <[email protected]>

* Add test

Signed-off-by: mgoin <[email protected]>

* make mypy happy

Signed-off-by: mgoin <[email protected]>

* move MultiKVConnectorMetadata to multi_connector.py

Signed-off-by: Nick Hill <[email protected]>

* minor simplifications

Signed-off-by: Nick Hill <[email protected]>

* Remove script

Signed-off-by: mgoin <[email protected]>

* michael inprogress

Signed-off-by: Nick Hill <[email protected]>

* Make sure we pop requests from connector dict

Signed-off-by: mgoin <[email protected]>

* req_id -> request_id

Signed-off-by: mgoin <[email protected]>

---------

Signed-off-by: mgoin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: mgoin <[email protected]>
* Test xPyD

Signed-off-by: Tyler Michael Smith <[email protected]>

* backwards compatibility

Signed-off-by: Tyler Michael Smith <[email protected]>

* settable from env

Signed-off-by: Tyler Michael Smith <[email protected]>

---------

Signed-off-by: Tyler Michael Smith <[email protected]>
To verify load precedence behavior

Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <[email protected]>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <[email protected]>

* [add] extra information about evns

Signed-off-by: ApostaC <[email protected]>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Updates

Signed-off-by: Tyler Michael Smith <[email protected]>

* Rs branch (#3)

* updated

Signed-off-by: [email protected] <[email protected]>

* Rs branch (#5)

Signed-off-by: [email protected] <[email protected]>

* Remove Unneeded Arguments (#7)

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* cleanup

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Improve disagg-example.sh (#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* added connector

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* update

Signed-off-by: [email protected] <[email protected]>

* remove

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* seems to load properly

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* added

Signed-off-by: [email protected] <[email protected]>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* updaed

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* WIP

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated on scheduler side

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Hacking away

Signed-off-by: Tyler Michael Smith <[email protected]>

* cleanup

Signed-off-by: Robert Shaw <[email protected]>

* ensure request removed from running list

Signed-off-by: Robert Shaw <[email protected]>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* rename files

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <[email protected]>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <[email protected]>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* justfile edits

Signed-off-by: Tyler Michael Smith <[email protected]>

* Update

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <[email protected]>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <[email protected]>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated (#12)

Signed-off-by: [email protected] <[email protected]>

* Add Accuracy Test (#13)

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Preemption Bugfixes (#15)

* stash fixed double free issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* fixed issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated (#16)

Signed-off-by: [email protected] <[email protected]>

* Fix Bad Merge | Fix Memory Leak in Upstream (vllm-project#18)

* updated

Signed-off-by: [email protected] <[email protected]>

* fix merge

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* cleanup code

Signed-off-by: [email protected] <[email protected]>

* cleanup code

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatted

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* revert

Signed-off-by: [email protected] <[email protected]>

* more spurious changes

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Co-authored-by: Tyler Michael Smith <[email protected]>

* Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Co-authored-by: Tyler Michael Smith <[email protected]>

---------

Signed-off-by: ApostaC <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Co-authored-by: ApostaC <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <[email protected]>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <[email protected]>

* [add] extra information about evns

Signed-off-by: ApostaC <[email protected]>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Updates

Signed-off-by: Tyler Michael Smith <[email protected]>

* Rs branch (#3)

* updated

Signed-off-by: [email protected] <[email protected]>

* Rs branch (#5)

Signed-off-by: [email protected] <[email protected]>

* Remove Unneeded Arguments (#7)

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* cleanup

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Improve disagg-example.sh (#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* added connector

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* update

Signed-off-by: [email protected] <[email protected]>

* remove

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* seems to load properly

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* added

Signed-off-by: [email protected] <[email protected]>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* updaed

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* WIP

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated on scheduler side

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Hacking away

Signed-off-by: Tyler Michael Smith <[email protected]>

* cleanup

Signed-off-by: Robert Shaw <[email protected]>

* ensure request removed from running list

Signed-off-by: Robert Shaw <[email protected]>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* rename files

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <[email protected]>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <[email protected]>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* justfile edits

Signed-off-by: Tyler Michael Smith <[email protected]>

* Update

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <[email protected]>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <[email protected]>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated (#12)

Signed-off-by: [email protected] <[email protected]>

* Add Accuracy Test (#13)

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Preemption Bugfixes (#15)

* stash fixed double free issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* fixed issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated (#16)

Signed-off-by: [email protected] <[email protected]>

* Fix Bad Merge | Fix Memory Leak in Upstream (#18)

* updated

Signed-off-by: [email protected] <[email protected]>

* fix merge

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* cleanup code

Signed-off-by: [email protected] <[email protected]>

* cleanup code

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatted

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* revert

Signed-off-by: [email protected] <[email protected]>

* more spurious changes

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Support MLA in NIXL connector

Signed-off-by: Tyler Michael Smith <[email protected]>

* WIP adding tests

Signed-off-by: Tyler Michael Smith <[email protected]>

* wip

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

---------

Signed-off-by: ApostaC <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Co-authored-by: ApostaC <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Copy link

github-actions bot commented May 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@kouroshHakha
Copy link
Collaborator

@robertgshaw2-redhat question: Does P instance block for handshake to happen? The proxy suggests that prefill does not need to know about the decode instance when the request is sent to it. Can you at high level explain the data and control flow of this architecture?

for block in self.single_type_manager.req_to_blocks[request_id]
]

def get_num_blocks(self, request_id: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that this function is not used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, I will remove this.

@simon-mo simon-mo merged commit d191102 into vllm-project:main May 12, 2025
72 of 87 checks passed
@robertgshaw2-redhat
Copy link
Collaborator Author

@robertgshaw2-redhat question: Does P instance block for handshake to happen? The proxy suggests that prefill does not need to know about the decode instance when the request is sent to it. Can you at high level explain the data and control flow of this architecture?

Here's the doc:

@kouroshHakha
Copy link
Collaborator

@robertgshaw2-redhat thanks, requested access.

diabloneo added a commit to diabloneo/VUA that referenced this pull request May 13, 2025
The KVConnector API have been updated after merging the integration
of NIXL in this PR: vllm-project/vllm#17751

Update VUA's method signatures to match the new API.

Signed-off-by: diabloneo <[email protected]>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
Signed-off-by: ApostaC <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Brent Salisbury <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: ApostaC <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Brent Salisbury <[email protected]>
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
Signed-off-by: ApostaC <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Brent Salisbury <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: ApostaC <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Brent Salisbury <[email protected]>
Signed-off-by: Yuqi Zhang <[email protected]>
@LCAIZJ
Copy link

LCAIZJ commented May 24, 2025

@robertgshaw2-redhat Can you open access to this document https://docs.google.com/document/d/1lDQt6hUXBoMnMabsuAMZAWdqit7TzuflsFGHLt2gZus/edit?tab=t.0

@zhaohaidao
Copy link
Contributor

@robertgshaw2-redhat question: Does P instance block for handshake to happen? The proxy suggests that prefill does not need to know about the decode instance when the request is sent to it. Can you at high level explain the data and control flow of this architecture?

Here's the doc:

@robertgshaw2-redhat Hi,Can you grant me access? requested access.

@new-TonyWang
Copy link

hello, and execuse me. When I was learning NIXL intergration, I have a question about this pr. How does WAITING_FOR_REMOTE_KVS work in PD Disaggregation and why is this different from other solution (like simple connector). Looking forward to your reply. Thank you very much.

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: ApostaC <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Brent Salisbury <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: ApostaC <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Brent Salisbury <[email protected]>
Signed-off-by: minpeter <[email protected]>
@AsicDyc
Copy link

AsicDyc commented Jul 31, 2025

@robertgshaw2-redhat question: Does P instance block for handshake to happen? The proxy suggests that prefill does not need to know about the decode instance when the request is sent to it. Can you at high level explain the data and control flow of this architecture?

Here's the doc:

Hey, can you give me a request access?

1 similar comment
@david6666666
Copy link
Contributor

@robertgshaw2-redhat question: Does P instance block for handshake to happen? The proxy suggests that prefill does not need to know about the decode instance when the request is sent to it. Can you at high level explain the data and control flow of this architecture?

Here's the doc:

Hey, can you give me a request access?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build frontend ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.