Skip to content

Conversation

vMaroon
Copy link
Member

@vMaroon vMaroon commented Jul 15, 2025

Summary

Added kvblock.Index metrics collection for prometheus, with support for logging periodically.

Copy link
Collaborator

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all GIE metrics are added to the controller-runtime metrics registry which is then served from main.
You should be registering metrics into the same registry.
See https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/4ffb5f6dd4f6d004cfbcde6dd4da78009acdfba9/pkg/epp/metrics/metrics.go#L262 for example.

@vMaroon
Copy link
Member Author

vMaroon commented Jul 16, 2025

Thank @elevran, I split the implicit registration call (init) into Collectors and Register (which now uses the controller-runtime metrics registry) for granular control.

Copy link
Collaborator

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(very :-)) minor nit: consider tracking only hits, not both hits and miss

Comment on lines 23 to 33
// LookupRequests counts how many Lookup() calls have been made.
LookupRequests = prometheus.NewCounter(prometheus.CounterOpts{
Namespace: "kvcache", Subsystem: "index", Name: "lookup_requests_total",
Help: "Total number of lookup calls",
})
// KeyLookupResults counts keys examined, labelled by result “hit” or
// “miss”.
KeyLookupResults = prometheus.NewCounterVec(prometheus.CounterOpts{
Namespace: "kvcache", Subsystem: "index", Name: "lookup_key_total",
Help: "Number of keys looked up by result (hit/miss)",
}, []string{"result"})
Copy link
Collaborator

@elevran elevran Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to decipher what these mean.
if you have two counters, one for total lookups and one for total hits, you can use PromQL later to calculate rates over time periods as well as "trends".
The LookupRequests seems to be the total lookups. Is KeyLookupResults just two counters - one for hit and one for miss?
Can't you do miss-rate = (total-hit)/total on the fly instead of counting both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done.

Signed-off-by: Maroon Ayoub <[email protected]>
@vMaroon vMaroon merged commit 0ac45c6 into main Jul 16, 2025
1 check passed
@vMaroon vMaroon deleted the metrics branch July 16, 2025 11:41
Help: "Total number of lookup calls",
})
// LookupHits counts how many keys were found in the cache on Lookup().
LookupHits = prometheus.NewCounter(prometheus.CounterOpts{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do it per request?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't that cause an infinite number of metrics (one per request since each would have a different label)?
If this is mostly for debug/validation, maybe we can consider a "tracing" header to the request so it logs the hit ratio when done? Requests without the special header won't log information to avoid flooding the log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants