-
Notifications
You must be signed in to change notification settings - Fork 730
Description
Feature request
Add a cumulative counter for requested tokens (e.g. kvbm_requested_tokens_total) so users can compute KVBM token hit-rate in Prometheus/Grafana as matched_tokens / requested_tokens. This should be registered in the KVBM metrics registry and incremented once per incoming request.
Describe the problem you're encountering
Currently the repo exposes kvbm_matched_tokens and several offload/onboard block counters (kvbm_offload_blocks_d2h, kvbm_offload_blocks_h2d, kvbm_offload_blocks_d2d, kvbm_onboard_blocks_h2d, kvbm_onboard_blocks_d2d). Those are absolute counts of matched tokens or block transfers, but there is no metric representing the total number of requested/input tokens. Without a denominator, we cannot compute a token-level hit-rate (percentage of request tokens satisfied by KVBM), which is the most meaningful measure of cache effectiveness.
Describe alternatives you've tried
Displaying kvbm_matched_tokens as an timeseries in Grafana — useful but not sufficient.
Approximating hit-rate from offload/onboard block metrics — inaccurate and potentially misleading.
We could compute a derived rate only on the server side, but the simpler, reliable approach is to expose kvbm_requested_tokens_total and compute the hit-rate in Prometheus/Grafana using rate().