Skip to content

Conversation

@bnallapeta
Copy link
Contributor

What this PR does / why we need it:
ClusterClass-managed MachinePools were not triggering node rollouts when BootstrapConfig or InfrastructureMachinePool templates changed. This happened because updateMachinePool() used reconcileReferencedObject() which patches objects in-place without changing their names. Infrastructure providers (like CAPA) watch for configRef.name changes to trigger rollouts, but since the name never changed, no rollout occurred.

This PR changes updateMachinePool() to use reconcileReferencedTemplate() instead. This is the same approach used by updateMachineDeployment().

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #10496

/area machinepool
/area clusterclass

@k8s-ci-robot k8s-ci-robot added area/machinepool Issues or PRs related to machinepools area/clusterclass Issues or PRs related to clusterclass labels Dec 5, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 5, 2025
@bnallapeta bnallapeta marked this pull request as draft December 5, 2025 04:17
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 5, 2025
@bnallapeta
Copy link
Contributor Author

@richardcase @AndiDog
Please take a look at this. I am in the process of manually testing this and will write e2e. But, early feedback on the approach will help.

desired: desiredMP.BootstrapObject,
}); err != nil {
if createdInfra {
infrastructureMachinePoolCleanupFunc = func() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why are we creating an inline function instead of defining it somewhere else? Not as familiar with Kubernetes code bases so disregard if there there's a good reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. I'd like to stay consistent with the current style where this is used in multiple other places already.
b. I think inline functions help us keep the logic right where we need to make the decision (should we cleanup or not). The alternative would require us to send all the context, obj, logger and handle the errors differently and this feels a bit too much for the purpose.

@bnallapeta
Copy link
Contributor Author

Just finished testing the fix manually with CAPD and it's working as expected!

Summary:

  • Patched a KubeadmConfigTemplate to add a file
  • Topology controller correctly created a new KubeadmConfig and updated the MachinePool.spec.template.spec.bootstrap.configRef
  • Template rotation is now working for ClusterClass MachinePools
$ kubectl get kubeadmconfig -l cluster.x-k8s.io/cluster-name=test-cluster
NAME                       AGE
test-cluster-mp-0-2krrr    58m  # old config
test-cluster-mp-0-sw2m8    2m   # new config (rotated!)

$ kubectl get machinepool $MACHINEPOOL_NAME -o jsonpath='{.spec.template.spec.bootstrap.configRef.name}'
test-cluster-mp-0-sw2m8  # <-- reference updated correctly

Why CAPD?
Checked CAPA first but it's missing AWSMachinePoolTemplate (verified in their codebase). CAPD is currently the only provider with the template CRDs needed for ClusterClass MachinePools.

Next steps:
Moving on to E2E tests. Will look into how to add test coverage for this - any pointers on this is much appreciated!

(Detailed testing steps available if anyone wants to reproduce)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/clusterclass Issues or PRs related to clusterclass area/machinepool Issues or PRs related to machinepools cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ClusterClass MachinePool implementation is probably not able to rollout BootstrapConfig changes.

3 participants