-
Notifications
You must be signed in to change notification settings - Fork 280
🐛 if the openstackcluster was ready, we don't want to set a terminalError #2099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 if the openstackcluster was ready, we don't want to set a terminalError #2099
Conversation
Hi @qeqar. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/ok-to-test |
What seems wrong to me, is that if a user changes the bastion spec with a new flavor, that new flavor doesn't exist in Nova, the controller fails to create the bastion and wants to update |
As this is fixable by fixing the flavor name in the CR, it seems to be the same type of error. So it should not be permanently too. Setting ready to false is done in no case at the moment, so this would change the behavior, if that is ok, i can put it in the else path, but then i need so find the proper place(s) to set it true again. Edit: and i am not sure what capi will do if it goes to ready false while provisiond |
After some additional thinking, would setting ready to false, removes the validly of my if clause. So first: it need to be set on every error, as a permanent error is more critical an should result in an ready false and second if we have two problems, f.e. some error and we activate the bastion host to have look and then do a flavor typo, it will go directly to a permanent error, as ready was already false. If we start setting ready to false after it was set to true, we need to find a different switch between permanent and transient errors. |
I've asked about setting |
2dd8453
to
9703ff1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of cases! I will sympathise entirely if you just want to cover the easy ones, including the one that's biting you.
In general, we want to set a terminal error only for:
- bad user input
- An unrecoverable error, like something we previously created no longer exists
The first is really only relevant during initial resource creation, and we should not set terminal errors for this.
The second is more complex in the cluster controller. My advice is just to punt on these for now.
If in doubt it shouldn't be terminal.
@lentzi90 @EmilienM I'd appreciate a second set of eyes on this list.
Will have a look on Monday, if i have some time, then i am out for the rest of the week, so could take some time. |
fc58220
to
ac13660
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
/lgtm
We should probably think about adding conditions for some of the non-fatal errors at some point also
Ok, i am not fully happy with setting the |
ac13660
to
9f1a352
Compare
Ok now re-based with main to get the #2109 error class changes and all occurrences updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please squash the commits into one and check my question below
so we have the else path for some future logging/state improvments Set most errors to transient/non-fatal
2918c08
to
9643251
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lentzi90 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thank you!
Excellent contribution, thank you! |
f.e if the creds break, but the cluster was ready, the operator can fix it.
What this PR does / why we need it:
Use transient Error if the Openstackcluster was once ready.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2097
Special notes for your reviewer:
/hold