[V1][Metrics] add support for kv event publishing #16750

alec-flowers · 2025-04-17T02:05:02Z

RFC: KVBlocks and Metrics Publishing In Inference Frameworks

Added KVCacheEvent, BlockStored, BlockRemoved, and AllBlocksCleared msgspec classes
Created a queue in the BlockPool and write these events in the appropriate functions
Bubble the events up to the scheduler where they are appended to EngineCoreOutputs
Add kv_cach_events to EngineCoreOutputs
Wrote unit tests at the BlockManager level to test basic functionality and at the EngineCore level testing correct propagation and serializing over zmq.

API

add enable_kv_cache_events to engineArgs
~~- add external_stat_loggers field to AsyncLLM API~~ Covered by [V1][Metrics] Allow V1 AsyncLLM to use custom logger #14661

With #14661 and this PR a 3rd party can write a custom stat logger to consume both engine Stats and Events and publish them elsewhere.

github-actions · 2025-04-17T02:05:13Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-04-17T02:54:45Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alec-flowers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

njhill · 2025-04-17T14:29:26Z

@alec-flowers re the custom stats loggers, there's another PR for this which is hopefully close to being merged: #14661. As you alluded to, we need to take factories/constructors for the multi-engine case (or otherwise change the stats logger API to include an an engine id).

alec-flowers · 2025-04-17T16:34:02Z

@alec-flowers re the custom stats loggers, there's another PR for this which is hopefully close to being merged: #14661. As you alluded to, we need to take factories/constructors for the multi-engine case (or otherwise change the stats logger API to include an an engine id).

@njhill Awesome! Looks like there has been some good discussion around the API. I will remove my WIP stuff from this PR and reference that PR.

robertgshaw2-redhat · 2025-04-17T21:55:17Z

cc @markmc - FYI, this feature it to enable creating a global KV cache view in NVIDIA Dynamo

vllm/v1/engine/async_llm.py

vllm/v1/core/sched/scheduler.py

robertgshaw2-redhat · 2025-04-17T22:03:33Z

This generally looks very reasonable to me.

It would be useful to add an example of the Logger which uses the the events, for testing purposes. Otherwise this will break.

Is there a reason why you don't want to implement the KVCacheEventsLogger inside VLLM?

alec-flowers · 2025-04-17T22:48:52Z

This generally looks very reasonable to me.

It would be useful to add an example of the Logger which uses the the events, for testing purposes. Otherwise this will break.

Is there a reason why you don't want to implement the KVCacheEventsLogger inside VLLM?

Main reason is to avoid adding dynamo (or pieces of dynamo) as a dependency to vLLM. For our logger, we would need to add in a rust based publisher (with python bindings) from Dynamo that sends the events over NATS to another component.
https://github.com/ai-dynamo/dynamo/blob/main/lib/llm/src/kv_router/publisher.rs
https://github.com/ai-dynamo/dynamo/blob/main/lib/bindings/python/src/dynamo/_core.pyi#L497

I'm definitely flexible and open to ideas here.

And good reminder, I will add in an integration test with a mock StatLogger that will break if the API changes too much.

ApostaC

Quick question: how is KVCacheEvent published? I saw it's passed into the metrics logger, but seems like it's not used?

ApostaC · 2025-04-18T01:10:16Z

vllm/v1/engine/__init__.py

+class KVCacheEvent(
+        msgspec.Struct,
+        array_like=True,  # type: ignore[call-arg]
+        omit_defaults=True,  # type: ignore[call-arg]
+        gc=False,  # type: ignore[call-arg]
+        tag=True):
+    """Base class for all KV cache-related events"""
+
+
+class BlockStored(KVCacheEvent):
+    block_hashes: list[int]
+    parent_block_hash: Optional[int]
+    token_ids: list[int]
+    num_toks_per_block: list[int]
+    lora_id: Optional[int]
+
+
+class BlockRemoved(KVCacheEvent):
+    block_hashes: list[int]
+
+
+class AllBlocksCleared(KVCacheEvent):
+    pass


Should those events be defined in a source file that is more related to KV cache management? For example, kv_cache_utils.py

vllm/v1/core/kv_cache_manager.py

markmc · 2025-04-18T11:57:10Z

Main reason is to avoid adding dynamo (or pieces of dynamo) as a dependency to vLLM. For our logger, we would need to add in a rust based publisher (with python bindings) from Dynamo that sends the events over NATS to another component. https://github.com/ai-dynamo/dynamo/blob/main/lib/llm/src/kv_router/publisher.rs https://github.com/ai-dynamo/dynamo/blob/main/lib/bindings/python/src/dynamo/_core.pyi#L497

I'm definitely flexible and open to ideas here.

re my other comment #16669 (comment)

If this is a hook to capture stored/removed KV cache events so they can be sent elsewhere, wouldn't the KV connector API be a better fit - no need to route these events to a logger in the frontend for this?

alec-flowers · 2025-04-18T18:34:30Z

Quick question: how is KVCacheEvent published? I saw it's passed into the metrics logger, but seems like it's not used?

@ApostaC It would be published by a 3rd party defined Logger. You would be free to write your own that would publish it how you want. That is why in this case we would be reliant on this issue - #14661, which adds support for passing in a logger at runtime to the AysncLLM class.

njhill · 2025-04-28T16:56:13Z

@alec-flowers it looks like there may be an issue with the test_kv_cache_events test timing out: https://buildkite.com/vllm/ci/builds/18784#01967c2d-91f5-42bb-b3ae-ce520c3ddb11/210-1613. It might help to set daemon=True when starting the publishing thread.

alec-flowers · 2025-04-28T19:39:23Z

@alec-flowers it looks like there may be an issue with the test_kv_cache_events test timing out: https://buildkite.com/vllm/ci/builds/18784#01967c2d-91f5-42bb-b3ae-ce520c3ddb11/210-1613. It might help to set daemon=True when starting the publishing thread.

I'm not able to replicate locally

When I start the publishing thread the daemon=True is set

 self._thread = threading.Thread(target=self._publisher_thread,
                                        daemon=True,
                                        name="zmq-publisher")

will try a few things out.

Signed-off-by: alec-flowers <[email protected]>

mergify · 2025-04-29T20:54:58Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alec-flowers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: alec-flowers <[email protected]>

mergify · 2025-04-30T02:45:54Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alec-flowers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: alec-flowers <[email protected]>

alec-flowers · 2025-04-30T06:24:58Z

Took me some time, but isolated the problem to be the previous test_engine_core_client_asyncio which was leaving a thread open with a zmq socket that was causing problems with my test. I believe I've fixed it.

@njhill I'm having some problems with CI. It seems the last run failed in the build stage. I'm also constantly having to re-base due to a large import block. Hopefully this run goes smooth.

njhill · 2025-04-30T14:43:37Z

Thanks @alec-flowers, sorry you had to take extra time to debug that. I think it's a lingering issue where the client destructor doesn't always get run. In theory the explicit shutdown shouldn't be needed but makes sense to include it here for now.

Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]>

Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: minpeter <[email protected]>

mergify bot added the v1 label Apr 17, 2025

alec-flowers force-pushed the kv-event-publishing branch from ea3a2ec to 174c1fc Compare April 17, 2025 02:19

mergify bot added the needs-rebase label Apr 17, 2025

alec-flowers changed the title ~~[Misc] add support for kv event publishing and custom statloggers~~ [V1][Misc] add support for kv event publishing and custom statloggers Apr 17, 2025

alec-flowers changed the title ~~[V1][Misc] add support for kv event publishing and custom statloggers~~ [V1][Metrics] add support for kv event publishing and custom statloggers Apr 17, 2025

alec-flowers changed the title ~~[V1][Metrics] add support for kv event publishing and custom statloggers~~ [V1][Metrics] add support for kv event publishing Apr 17, 2025

alec-flowers force-pushed the kv-event-publishing branch from 04b609d to 7345640 Compare April 17, 2025 20:39

mergify bot removed the needs-rebase label Apr 17, 2025

alec-flowers force-pushed the kv-event-publishing branch from 7345640 to 925db1d Compare April 17, 2025 21:03

alec-flowers marked this pull request as ready for review April 17, 2025 21:03

alec-flowers requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners April 17, 2025 21:03

robertgshaw2-redhat assigned markmc Apr 17, 2025

robertgshaw2-redhat reviewed Apr 17, 2025

View reviewed changes

vllm/v1/engine/async_llm.py Outdated Show resolved Hide resolved

robertgshaw2-redhat reviewed Apr 17, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

ApostaC reviewed Apr 18, 2025

View reviewed changes

alec-flowers force-pushed the kv-event-publishing branch from 7d37b37 to ddae327 Compare April 23, 2025 05:23

fixes, shutdown to kv_event_test

7fe1145

Signed-off-by: alec-flowers <[email protected]>

alec-flowers force-pushed the kv-event-publishing branch 3 times, most recently from 560b6bf to b534614 Compare April 29, 2025 08:16

fix: attempt to fix hanging test

7ec5feb

Signed-off-by: alec-flowers <[email protected]>

alec-flowers force-pushed the kv-event-publishing branch from b534614 to 7ec5feb Compare April 29, 2025 09:30

fix: leaked thread from test_engine_core_client_asyncio

5117617

Signed-off-by: alec-flowers <[email protected]>

alec-flowers force-pushed the kv-event-publishing branch from 52bf679 to 5117617 Compare April 29, 2025 20:54

mergify bot added the needs-rebase label Apr 29, 2025

Merge branch 'main' into kv-event-publishing

1a5bf9b

Signed-off-by: alec-flowers <[email protected]>

mergify bot removed the needs-rebase label Apr 29, 2025

mergify bot added the needs-rebase label Apr 30, 2025

Merge branch 'main' into kv-event-publishing

8324a74

Signed-off-by: alec-flowers <[email protected]>

mergify bot removed the needs-rebase label Apr 30, 2025

njhill merged commit 0be6d05 into vllm-project:main Apr 30, 2025
53 checks passed

trevor-m mentioned this pull request May 7, 2025

[Metrics] Add KV events publishing sgl-project/sglang#6098

Merged

6 tasks

markmc mentioned this pull request May 21, 2025

[Bugfix][Failing Test] Fix test_events.py #18460

Merged

alec-flowers mentioned this pull request May 23, 2025

feat: add KV Event Publishing to vLLM v1 ai-dynamo/dynamo#1181

Merged

Uh oh!

[V1][Metrics] add support for kv event publishing #16750

[V1][Metrics] add support for kv event publishing #16750

Uh oh!

Conversation

alec-flowers commented Apr 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

mergify bot commented Apr 17, 2025

Uh oh!

njhill commented Apr 17, 2025

Uh oh!

alec-flowers commented Apr 17, 2025

Uh oh!

robertgshaw2-redhat commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Apr 17, 2025

Uh oh!

alec-flowers commented Apr 17, 2025

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markmc commented Apr 18, 2025

Uh oh!

alec-flowers commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill commented Apr 28, 2025

Uh oh!

alec-flowers commented Apr 28, 2025

Uh oh!

mergify bot commented Apr 29, 2025

Uh oh!

mergify bot commented Apr 30, 2025

Uh oh!

alec-flowers commented Apr 30, 2025

Uh oh!

njhill commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

alec-flowers commented Apr 17, 2025 •

edited by github-actions bot

Loading

alec-flowers commented Apr 18, 2025 •

edited

Loading