Skip to content

[FEATURE]: Add KVBM token hit-rate metric #4840

@flpanbin

Description

@flpanbin

Feature request

Add a cumulative counter for requested tokens (e.g. kvbm_requested_tokens_total) so users can compute KVBM token hit-rate in Prometheus/Grafana as matched_tokens / requested_tokens. This should be registered in the KVBM metrics registry and incremented once per incoming request.

Describe the problem you're encountering

Currently the repo exposes kvbm_matched_tokens and several offload/onboard block counters (kvbm_offload_blocks_d2h, kvbm_offload_blocks_h2d, kvbm_offload_blocks_d2d, kvbm_onboard_blocks_h2d, kvbm_onboard_blocks_d2d). Those are absolute counts of matched tokens or block transfers, but there is no metric representing the total number of requested/input tokens. Without a denominator, we cannot compute a token-level hit-rate (percentage of request tokens satisfied by KVBM), which is the most meaningful measure of cache effectiveness.

Describe alternatives you've tried

Displaying kvbm_matched_tokens as an timeseries in Grafana — useful but not sufficient.
Approximating hit-rate from offload/onboard block metrics — inaccurate and potentially misleading.
We could compute a derived rate only on the server side, but the simpler, reliable approach is to expose kvbm_requested_tokens_total and compute the hit-rate in Prometheus/Grafana using rate().

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions