-
Notifications
You must be signed in to change notification settings - Fork 27
Support for per-zone PDB #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add an integration test to cover this functionality?
development/custom-resource-definition-pod-disruption-zone-budget.yaml
Outdated
Show resolved
Hide resolved
development/custom-resource-definition-pod-disruption-zone-budget.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished reviewing the tests - will take a look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest splitting out the changes related to updating the TLS certificate configuration to their own PR, so that this PR remains focused on the ZPDB implementation.
I still need to review the tests in pkg/admission/pod_eviction_test.go
.
pkg/admission/pod_eviction_test.go
Outdated
statefulSetZoneA = "ingester-zone-a" | ||
statefulSetZoneB = "ingester-zone-b" | ||
statefulSetZoneC = "ingester-zone-c" | ||
testPodZoneA0 = "ingester-zone-a-0" | ||
testPodZoneA1 = "ingester-zone-a-1" | ||
testPodZoneA2 = "ingester-zone-a-2" | ||
testPodZoneB0 = "ingester-zone-b-0" | ||
testPodZoneB1 = "ingester-zone-b-1" | ||
testPodZoneB2 = "ingester-zone-b-2" | ||
testPodZoneC0 = "ingester-zone-c-0" | ||
testPodZoneC1 = "ingester-zone-c-1" | ||
testPodZoneC2 = "ingester-zone-c-2" | ||
testNamespace = "test-dev-0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I'm not sure these constants add much - I'd be inclined to inline them.
Includes; * new web-hook handler for pod eviction added to rollout-operator * manifests for new custom resource definition for configuring the PDBZ * updated development example files and environment
* rename to ZoneAwarePodDisruptionBudget * updated root README * updates to main eviction handler per PR *
* updates per PR review * added ZPDB informer to monitor ZPDB configurations * added Pod informer to work with pod eviction cache * added pod eviction cache * added validating webhook configuration for ZPDB files * added locking for concurrent eviction handling
Modify the rollout-operator to use an informer to monitor for new webhook configurations. Webhook configurations need to be patched with a self signed CA, and previously this was only checked and applied on rollout-operator startup. This change allows for new webhooks to be created once the rollout-operator is running and have the CA patch applied.
…is reporting as running
Modify the rollout-operator to use an informer to monitor for new webhook configurations. Webhook configurations need to be patched with a self signed CA, and previously this was only checked and applied on rollout-operator startup. This change allows for new webhooks to be created once the rollout-operator is running and have the CA patch applied.
…is reporting as running
…com/grafana/rollout-operator into andrewhall-194-support-per-zone-pdb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you might have missed my earlier comments on pkg/admission/pod_eviction_test.go
.
pkg/zpdb/config.go
Outdated
func (c *config) min(a int32, b int32) int32 { | ||
if a < b { | ||
return a | ||
} | ||
return b | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go has a built-in min
function that we should be able to use instead of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the math.min() which applies to float64 ... is the preference to cast to a float and then back to an int32 or is there another function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tcp13equals2 Charles is referring to the built-in min
function, which is generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha - nice. Thanks for the tip!
Update short name for zdpb crd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did my best to read through this to see if anything jumped out at me. Only found a couple small things. I still don't feel like I have a great mental model of this yet, but this has clearly already been through a lot of review by Charles.
Co-authored-by: Andy Asp <[email protected]>
This is a PR for the implementation of a custom pod disruption budget which is zone and partition aware.
#194
Included in this PR;
Note - this has been tested in mimir-dev-11 in the classic zones configuration draining multiple nodes which span multiple zones + pods.
Note - from a deployment perspective, when this is rolled out to real clusters it is likely it will have to rolled out in 2 phases. Phase 1 would have the updated rollout operator, new CRD, the validating webhook configurations but not the actual new zpdb configuration. This kind relies on the validating webhook to be in-place first, and if this validating webhook has just been added it can be a short wait until it has the CA bundle patched in allowing the endpoint to function. If the zpdb is applied immediately after the validating webhook is added it will fail. So Phase 2 would be rolling out the actual zpdb files which replace the existing pdb file.
Deployment tools libsonnet updates - https://github.com/grafana/deployment_tools/pull/323553
Mimir libsonnet updates - grafana/mimir#12315 (in progress)