-
Notifications
You must be signed in to change notification settings - Fork 288
⚠️ Enable removal of ExternallyProvisioned attribute from BareMetalHosts #2566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @ethan-gallant. Thanks for your PR. I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
ce225df to
8b7a6d5
Compare
Signed-off-by: Ethan J. Gallant <[email protected]>
8b7a6d5 to
f5aa16b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this topic would need some community consensus at least a lazy one. I don't remember at the moment why was it decided to block an externally provisioned node to transition to the regular provisioned state.
In principle I don't see why we shouldn't allow this transition but I am not using this feature on a daily basis so I am interested in the opinions of those who do might have more experience with the use-case, they might know what would be the side effects of allowing the proposed state transition.
| // if the credentials change and the Host must be re-registered. | ||
| if hsm.Host.Spec.ExternallyProvisioned { | ||
| hsm.NextState = metal3api.StateExternallyProvisioned | ||
| } else if hsm.Host.Status.Provisioning.State == metal3api.StateExternallyProvisioned { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this implementation move all externally provisioned BMH to "regular provisioned" state even if the user would add an unrelated annotation or the reconciliation would be triggered for whatever other reason?
It might be just a theoretical scenario but I would like to have an an additionl condition that checks that the "ExtarnallyProvisioned" filed is really set to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this lands in an else block after the check for externally provisioned I don't believe that would be possible here.
Happy to reformat it as well into a switch statement or similar as the conditional branches here are getting a bit wild already before the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this does anything at all, because in handleRegistering() the hsm.Host.Status.Provisioning.State is, by definition, StateRegistering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this lands in an else block after the check for externally provisioned I don't believe that would be possible here.
Happy to reformat it as well into a switch statement or similar as the conditional branches here are getting a bit wild already before the change.
That is true , silly me :D .
But then there is the other thing zaneb pointed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original bug is about the incorrect state in Ironic: once we get to Inspecting, BMO does not expect to find the Node in the active state in Ironic, but that's what actually happens. BMO needs to deprovision the host first. Then make sure that inspection still happen.
| Expect(bmh.Status.OperationHistory.Inspect.Start.IsZero()).To(BeTrue()) | ||
| Expect(bmh.Status.OperationHistory.Provision.Start.IsZero()).To(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure these two conditions mean that the BMH is still in registering as they would never be true in case of an externally provisioned BMH at least that would be my expectation.
I assume you have run this test locally but I wonder how long does the BMH stay in "Registering" state that your implementation relies on.
How do you ensure that the BMH stays in registering state to allow the test code to do the bmh.Spec.ExternallyProvisioned field update? I am not a user of the externally provisioned feature so I might be missing something but I would have suspected the BMH to very quickly transition out of "Registering" and If that is true this test would be very prone to timing issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a manual test however I had some issues getting the e2e tests running locally with make test-e2e as the e2e image from quay.io could not be loaded.
I'll try again today as it seems that the tests did not align with what I saw manually testing. If you have any guidance that I might have missed getting the E2E working locally (on Linux / MacOS) that would be greatly appreciated.
|
I have changed this from bug to breaking change, and IMO this is a feature not a bug as blocking the transition between externally and regularly provisioned state was intentional and documented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be useful if you want to transition hosts from an externally provisioned source into being directly managed.
I was going to say that the question is why you need to do that.
An externally provisioned host is still fully managed in terms of power state.
The other thing you can do with a provisioned host is deprovision it... but that's what happens when you remove the externallyProvisioned flag anyway.
However, I see that somebody has prevented you from removing the flag in the webhook. That is a real bug in my opinion. Otherwise (e.g. with the webhook disabled) it already works correctly and as-documented as far as I know.
| // if the credentials change and the Host must be re-registered. | ||
| if hsm.Host.Spec.ExternallyProvisioned { | ||
| hsm.NextState = metal3api.StateExternallyProvisioned | ||
| } else if hsm.Host.Status.Provisioning.State == metal3api.StateExternallyProvisioned { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this does anything at all, because in handleRegistering() the hsm.Host.Status.Provisioning.State is, by definition, StateRegistering.
| errs = append(errs, errors.New("bootMACAddress can not be changed once it is set")) | ||
| } | ||
|
|
||
| if oldObj.Spec.ExternallyProvisioned != newObj.Spec.ExternallyProvisioned { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a valid bug fix.
According to the state diagram, it is valid to remove the externallyProvisioned flag: https://github.com/metal3-io/baremetal-operator/blob/main/docs/baremetalhost-states.md
I have no idea why the webhook is preventing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked issue gives a bit more background, the webhook seems to be a bandaid fix to prevent the host from getting stuck in the provisioning state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To echo what I said in the bug: if removing externallyProvisioned unconditionally deprovisions the host, we need to make sure that image and customDeploy are not set.
Or we need to permit them to be set and keep the host Provisioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I may have misunderstood the issue and this will need a bit of additional work.
Just to confirm (since I was not at the working group meetings) the desired behavior is:
- If the
Image&customDeployare set- Move the host to the
Provisionedstate since the host complies with the standard setup, it was just (as expected)externallyProvisioned
- Move the host to the
- If the host does not have the
image&customDeployfields set- Deprovision the host since the host can not be managed in the standard fashion
Overall I'm wondering if the approach originally for the externallyProvisioned would have been to follow the cert-manager pattern and had a ProvisionRequest CR where some external system could fulfill the provision instead of hacking around the BareMetalHost CR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm wondering if the approach originally for the externallyProvisioned would have been to follow the cert-manager pattern and had a ProvisionRequest CR where some external system could fulfill the provision instead of hacking around the BareMetalHost CR.
Just to reflect on this point, this would be a nice approach but in practice the system that has done the provisioning might not be cloud native or not even automated at all so a ProvisionRequest CR would provide some optional extra functionality, but it would still not cover all the use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm wondering if the approach originally for the externallyProvisioned would have been to follow the cert-manager pattern and had a ProvisionRequest CR where some external system could fulfill the provision instead of hacking around the BareMetalHost CR.
Well, we should have had a separate CRD for provisioning for sure. The HostClaim work metal3-io/metal3-docs#408 is a step in that direction but it does not address externally provisioned.
Just to confirm (since I was not at the working group meetings) the desired behavior is
I think your summary is correct, except that: "If the Image or customDeploy are set" (one is enough).
|
On further inspection, I see that the webhook change was added recently to prevent people encountering bug #2465. |
| hsm.NextState = metal3api.StateExternallyProvisioned | ||
| } else if hsm.Host.Status.Provisioning.State == metal3api.StateExternallyProvisioned { | ||
| // Removing externallyManaged moves hosts to provisioned state | ||
| hsm.NextState = metal3api.StateProvisioned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Zane's comment: this is a wrong place. You need to fix handleExternallyProvisioned instead.
| err = clusterProxy.GetClient().Update(ctx, &bmh) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
|
|
||
| Eventually(func(g Gomega) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use WaitForBmhInProvisioningState
|
|
||
| Eventually(func(g Gomega) { | ||
| err = clusterProxy.GetClient().Get(ctx, kclient.ObjectKeyFromObject(&bmh), &bmh) | ||
| g.Expect(bmh.Status.Provisioning.State).To(Equal(metal3api.StateProvisioned)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the test is written, the expected state is not Provisioned. Or even: it make become Provisioned briefly, then BMO detects that there is no image and no customDeploy and deprovisions.
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
It enables removing the ExternallyProvisioned attribute, this can be useful if you want to transition hosts from an externally provisioned source into being directly managed.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #2465
Signed-off-by: Ethan J. Gallant [email protected]