Skip to content

[V1][Metrics] add support for kv event publishing #16750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 30, 2025

Conversation

alec-flowers
Copy link
Contributor

@alec-flowers alec-flowers commented Apr 17, 2025

RFC: KVBlocks and Metrics Publishing In Inference Frameworks

  • Added KVCacheEvent, BlockStored, BlockRemoved, and AllBlocksCleared msgspec classes
  • Created a queue in the BlockPool and write these events in the appropriate functions
  • Bubble the events up to the scheduler where they are appended to EngineCoreOutputs
  • Add kv_cach_events to EngineCoreOutputs
  • Wrote unit tests at the BlockManager level to test basic functionality and at the EngineCore level testing correct propagation and serializing over zmq.

API

With #14661 and this PR a 3rd party can write a custom stat logger to consume both engine Stats and Events and publish them elsewhere.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Apr 17, 2025
Copy link

mergify bot commented Apr 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alec-flowers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 17, 2025
@njhill
Copy link
Member

njhill commented Apr 17, 2025

@alec-flowers re the custom stats loggers, there's another PR for this which is hopefully close to being merged: #14661. As you alluded to, we need to take factories/constructors for the multi-engine case (or otherwise change the stats logger API to include an an engine id).

@alec-flowers
Copy link
Contributor Author

@alec-flowers re the custom stats loggers, there's another PR for this which is hopefully close to being merged: #14661. As you alluded to, we need to take factories/constructors for the multi-engine case (or otherwise change the stats logger API to include an an engine id).

@njhill Awesome! Looks like there has been some good discussion around the API. I will remove my WIP stuff from this PR and reference that PR.

@alec-flowers alec-flowers changed the title [Misc] add support for kv event publishing and custom statloggers [V1][Misc] add support for kv event publishing and custom statloggers Apr 17, 2025
@alec-flowers alec-flowers changed the title [V1][Misc] add support for kv event publishing and custom statloggers [V1][Metrics] add support for kv event publishing and custom statloggers Apr 17, 2025
@alec-flowers alec-flowers changed the title [V1][Metrics] add support for kv event publishing and custom statloggers [V1][Metrics] add support for kv event publishing Apr 17, 2025
@mergify mergify bot removed the needs-rebase label Apr 17, 2025
@alec-flowers alec-flowers marked this pull request as ready for review April 17, 2025 21:03
@robertgshaw2-redhat
Copy link
Collaborator

cc @markmc - FYI, this feature it to enable creating a global KV cache view in NVIDIA Dynamo

@robertgshaw2-redhat
Copy link
Collaborator

This generally looks very reasonable to me.

It would be useful to add an example of the Logger which uses the the events, for testing purposes. Otherwise this will break.

Is there a reason why you don't want to implement the KVCacheEventsLogger inside VLLM?

@alec-flowers
Copy link
Contributor Author

This generally looks very reasonable to me.

It would be useful to add an example of the Logger which uses the the events, for testing purposes. Otherwise this will break.

Is there a reason why you don't want to implement the KVCacheEventsLogger inside VLLM?

Main reason is to avoid adding dynamo (or pieces of dynamo) as a dependency to vLLM. For our logger, we would need to add in a rust based publisher (with python bindings) from Dynamo that sends the events over NATS to another component.
https://github.com/ai-dynamo/dynamo/blob/main/lib/llm/src/kv_router/publisher.rs
https://github.com/ai-dynamo/dynamo/blob/main/lib/bindings/python/src/dynamo/_core.pyi#L497

I'm definitely flexible and open to ideas here.

And good reminder, I will add in an integration test with a mock StatLogger that will break if the API changes too much.

Copy link
Collaborator

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: how is KVCacheEvent published? I saw it's passed into the metrics logger, but seems like it's not used?

Comment on lines 90 to 112
class KVCacheEvent(
msgspec.Struct,
array_like=True, # type: ignore[call-arg]
omit_defaults=True, # type: ignore[call-arg]
gc=False, # type: ignore[call-arg]
tag=True):
"""Base class for all KV cache-related events"""


class BlockStored(KVCacheEvent):
block_hashes: list[int]
parent_block_hash: Optional[int]
token_ids: list[int]
num_toks_per_block: list[int]
lora_id: Optional[int]


class BlockRemoved(KVCacheEvent):
block_hashes: list[int]


class AllBlocksCleared(KVCacheEvent):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should those events be defined in a source file that is more related to KV cache management? For example, kv_cache_utils.py

@markmc
Copy link
Member

markmc commented Apr 18, 2025

Main reason is to avoid adding dynamo (or pieces of dynamo) as a dependency to vLLM. For our logger, we would need to add in a rust based publisher (with python bindings) from Dynamo that sends the events over NATS to another component. https://github.com/ai-dynamo/dynamo/blob/main/lib/llm/src/kv_router/publisher.rs https://github.com/ai-dynamo/dynamo/blob/main/lib/bindings/python/src/dynamo/_core.pyi#L497

I'm definitely flexible and open to ideas here.

re my other comment #16669 (comment)

If this is a hook to capture stored/removed KV cache events so they can be sent elsewhere, wouldn't the KV connector API be a better fit - no need to route these events to a logger in the frontend for this?

@alec-flowers
Copy link
Contributor Author

alec-flowers commented Apr 18, 2025

Quick question: how is KVCacheEvent published? I saw it's passed into the metrics logger, but seems like it's not used?

@ApostaC It would be published by a 3rd party defined Logger. You would be free to write your own that would publish it how you want. That is why in this case we would be reliant on this issue - #14661, which adds support for passing in a logger at runtime to the AysncLLM class.

@njhill
Copy link
Member

njhill commented Apr 28, 2025

@alec-flowers it looks like there may be an issue with the test_kv_cache_events test timing out: https://buildkite.com/vllm/ci/builds/18784#01967c2d-91f5-42bb-b3ae-ce520c3ddb11/210-1613. It might help to set daemon=True when starting the publishing thread.

@alec-flowers
Copy link
Contributor Author

@alec-flowers it looks like there may be an issue with the test_kv_cache_events test timing out: https://buildkite.com/vllm/ci/builds/18784#01967c2d-91f5-42bb-b3ae-ce520c3ddb11/210-1613. It might help to set daemon=True when starting the publishing thread.

I'm not able to replicate locally
image

When I start the publishing thread the daemon=True is set

 self._thread = threading.Thread(target=self._publisher_thread,
                                        daemon=True,
                                        name="zmq-publisher")

will try a few things out.

@alec-flowers alec-flowers force-pushed the kv-event-publishing branch 3 times, most recently from 560b6bf to b534614 Compare April 29, 2025 08:16
Copy link

mergify bot commented Apr 29, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alec-flowers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 29, 2025
@mergify mergify bot removed the needs-rebase label Apr 29, 2025
Copy link

mergify bot commented Apr 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alec-flowers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 30, 2025
@mergify mergify bot removed the needs-rebase label Apr 30, 2025
@alec-flowers
Copy link
Contributor Author

Took me some time, but isolated the problem to be the previous test_engine_core_client_asyncio which was leaving a thread open with a zmq socket that was causing problems with my test. I believe I've fixed it.

@njhill I'm having some problems with CI. It seems the last run failed in the build stage. I'm also constantly having to re-base due to a large import block. Hopefully this run goes smooth.

@njhill
Copy link
Member

njhill commented Apr 30, 2025

Thanks @alec-flowers, sorry you had to take extra time to debug that. I think it's a lingering issue where the client destructor doesn't always get run. In theory the explicit shutdown shouldn't be needed but makes sense to include it here for now.

@njhill njhill merged commit 0be6d05 into vllm-project:main Apr 30, 2025
53 checks passed
radeksm pushed a commit to radeksm/vllm that referenced this pull request May 2, 2025
Signed-off-by: alec-flowers <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: alec-flowers <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
Signed-off-by: alec-flowers <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Signed-off-by: Yuqi Zhang <[email protected]>
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: alec-flowers <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Signed-off-by: minpeter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants