[FEATURE]: Add KVBM token hit-rate metric

### Feature request

Add a cumulative counter for requested tokens (e.g. `kvbm_requested_tokens_total`) so users can compute KVBM token hit-rate in Prometheus/Grafana as `matched_tokens` / `requested_tokens`. This should be registered in the KVBM metrics registry and incremented once per incoming request.

### Describe the problem you're encountering

Currently the repo exposes `kvbm_matched_tokens` and several offload/onboard block counters (`kvbm_offload_blocks_d2h`, `kvbm_offload_blocks_h2d`, `kvbm_offload_blocks_d2d`, `kvbm_onboard_blocks_h2d`, `kvbm_onboard_blocks_d2d`). Those are absolute counts of matched tokens or block transfers, but there is no metric representing the total number of requested/input tokens. Without a denominator, we cannot compute a token-level hit-rate (percentage of request tokens satisfied by KVBM), which is the most meaningful measure of cache effectiveness.

### Describe alternatives you've tried

Displaying `kvbm_matched_tokens` as an timeseries in Grafana — useful but not sufficient.
Approximating hit-rate from offload/onboard block metrics — inaccurate and potentially misleading.
We could compute a derived rate only on the server side, but the simpler, reliable approach is to expose kvbm_requested_tokens_total and compute the hit-rate in Prometheus/Grafana using rate().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE]: Add KVBM token hit-rate metric #4840

Feature request

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE]: Add KVBM token hit-rate metric #4840

Description

Feature request

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions