-
Notifications
You must be signed in to change notification settings - Fork 4.4k
VAULT-38888 Add prefix vault to metric summary definitions #31489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
CI Results: |
…definition' into VAULT-38888-fix-metrics-summary-definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have provided a comment related to consistency of name in code base
@@ -56,15 +56,15 @@ func init() { | |||
// telemetry reference docs so if updated should probably stay in sync. | |||
metricregistry.RegisterSummaries([]metricregistry.SummaryDefinition{ | |||
{ | |||
Name: []string{"core", "step_down"}, | |||
Name: []string{"vault", "core", "step_down"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding metric name arrays at multiple places, we should define them as constants/variables and reuse them.
var (
metricCoreStepDown = []string{"vault","core", "step_down"}
metricCoreLeadershipLost = []string{"vault","core", "leadership_lost"}
metricCoreLeadershipSetupFailed = []string{"vault","core", "leadership_setup_failed"}
)
Instead of using hard coded value: metrics.MeasureSince([]string{"core", "step_down"}, time.Now())
use metrics.MeasureSince(metricCoreStepDown, startTime)
similarly should also update:
line 314, 586 , 595, 659, 684, 696, 716, 753
This will help with maintainability, reduces duplication, and ensures metric names stay consistent across the codebase.
Build Results: |
Copy workflow completed!
|
Description
This PR fixes an issue where duplicate metrics for
core_leadership_lost
,core_step_down
andleadership_setup_failed
were being exposed on the /sys/metrics?format=prometheus endpoint.Refer this escalation ticket.
The Problem
Currently, the endpoint returns two versions of these metrics:
core_leadership_lost_*: A statically registered metric that is always zero.
vault_core_leadership_lost_*: The correct metric, which is properly populated at runtime.
This duplication is caused by a previous fix (PR #27966) that pre-registered the metrics to ensure their consistent availability. However, that registration did not account for the
service name prefix
added to metrics (in this case -vault
).The Solution
This change resolves the issue by adding the
vault prefix
to the static metric registrations. This aligns the pre-registered keys with the runtime keys, effectively eliminating the redundant, zero-value metrics from the output.TODO only if you're a HashiCorp employee
backport/
label that matches the desired release branch. Note that in the CE repo, the latest release branch will look likebackport/x.x.x
, but older release branches will bebackport/ent/x.x.x+ent
.of a public function, even if that change is in a CE file, double check that
applying the patch for this PR to the ENT repo and running tests doesn't
break any tests. Sometimes ENT only tests rely on public functions in CE
files.
in the PR description, commit message, or branch name.
description. Also, make sure the changelog is in this PR, not in your ENT PR.
PCI review checklist
Examples of changes to security controls include using new access control methods, adding or removing logging pipelines, etc.