-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
[V1][Metrics] add support for kv event publishing #16750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
ea3a2ec
to
174c1fc
Compare
This pull request has merge conflicts that must be resolved before it can be |
@alec-flowers re the custom stats loggers, there's another PR for this which is hopefully close to being merged: #14661. As you alluded to, we need to take factories/constructors for the multi-engine case (or otherwise change the stats logger API to include an an engine id). |
@njhill Awesome! Looks like there has been some good discussion around the API. I will remove my WIP stuff from this PR and reference that PR. |
04b609d
to
7345640
Compare
7345640
to
925db1d
Compare
cc @markmc - FYI, this feature it to enable creating a global KV cache view in NVIDIA Dynamo |
This generally looks very reasonable to me. It would be useful to add an example of the Logger which uses the the events, for testing purposes. Otherwise this will break. Is there a reason why you don't want to implement the KVCacheEventsLogger inside VLLM? |
Main reason is to avoid adding dynamo (or pieces of dynamo) as a dependency to vLLM. For our logger, we would need to add in a rust based publisher (with python bindings) from Dynamo that sends the events over NATS to another component. I'm definitely flexible and open to ideas here. And good reminder, I will add in an integration test with a mock StatLogger that will break if the API changes too much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question: how is KVCacheEvent
published? I saw it's passed into the metrics logger, but seems like it's not used?
vllm/v1/engine/__init__.py
Outdated
class KVCacheEvent( | ||
msgspec.Struct, | ||
array_like=True, # type: ignore[call-arg] | ||
omit_defaults=True, # type: ignore[call-arg] | ||
gc=False, # type: ignore[call-arg] | ||
tag=True): | ||
"""Base class for all KV cache-related events""" | ||
|
||
|
||
class BlockStored(KVCacheEvent): | ||
block_hashes: list[int] | ||
parent_block_hash: Optional[int] | ||
token_ids: list[int] | ||
num_toks_per_block: list[int] | ||
lora_id: Optional[int] | ||
|
||
|
||
class BlockRemoved(KVCacheEvent): | ||
block_hashes: list[int] | ||
|
||
|
||
class AllBlocksCleared(KVCacheEvent): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should those events be defined in a source file that is more related to KV cache management? For example, kv_cache_utils.py
re my other comment #16669 (comment) If this is a hook to capture stored/removed KV cache events so they can be sent elsewhere, wouldn't the KV connector API be a better fit - no need to route these events to a logger in the frontend for this? |
@ApostaC It would be published by a 3rd party defined Logger. You would be free to write your own that would publish it how you want. That is why in this case we would be reliant on this issue - #14661, which adds support for passing in a logger at runtime to the AysncLLM class. |
7d37b37
to
ddae327
Compare
@alec-flowers it looks like there may be an issue with the |
I'm not able to replicate locally When I start the publishing thread the daemon=True is set self._thread = threading.Thread(target=self._publisher_thread,
daemon=True,
name="zmq-publisher") will try a few things out. |
Signed-off-by: alec-flowers <[email protected]>
560b6bf
to
b534614
Compare
Signed-off-by: alec-flowers <[email protected]>
b534614
to
7ec5feb
Compare
Signed-off-by: alec-flowers <[email protected]>
52bf679
to
5117617
Compare
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: alec-flowers <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: alec-flowers <[email protected]>
Took me some time, but isolated the problem to be the previous test_engine_core_client_asyncio which was leaving a thread open with a zmq socket that was causing problems with my test. I believe I've fixed it. @njhill I'm having some problems with CI. It seems the last run failed in the build stage. I'm also constantly having to re-base due to a large import block. Hopefully this run goes smooth. |
Thanks @alec-flowers, sorry you had to take extra time to debug that. I think it's a lingering issue where the client destructor doesn't always get run. In theory the explicit shutdown shouldn't be needed but makes sense to include it here for now. |
Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]>
Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: minpeter <[email protected]>
RFC: KVBlocks and Metrics Publishing In Inference Frameworks
API
- add external_stat_loggers field to AsyncLLM APICovered by [V1][Metrics] Allow V1 AsyncLLM to use custom logger #14661With #14661 and this PR a 3rd party can write a custom stat logger to consume both engine Stats and Events and publish them elsewhere.