Skip to content

🐛 Deletion Deadlock #1578

@spjmurray

Description

@spjmurray

/kind bug

What steps did you take and what happened:

@mdbooth your input is treasured as per usual...

When I delete a cluster (properly this time, you can prevent Argo from deleting anything and let CAPI manage the whole process), what I'm seeing is control plane machines hanging.

From what I can tell there is a race where:

  • CAPI adds a owner reference on the infrastructure
  • I delete the Cluster
  • CAPI should delete... the MDs, the KCP, then the infrastructure
  • However the owner reference trigger the infrastructure delete early...

What happens is CAPO keeps trying to delete the infra and the CP machines at the same time.
As soon as the ports are detached from the network, that completes and the infrastructure gets deleted.
However the CP machines haven't been fully deleted yet, and cannot be deleted because they need to see the infrastructure resource in order to determine whether there's a loadbalancer that needs reconciling.

I expect either:

  • CAPI needs to stop messing with owner references and let its internal ordering take precedence
  • CAPO needs to cache infrastructure configuration in the machines so it doesn't need to refer to the OSC resource

What did you expect to happen:

You can actually delete things without a hang.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): 0.7.1
  • Cluster-API version: 1.3.2
  • OpenStack version: Zed
  • Minikube/KIND version:
  • Kubernetes version (use kubectl version): 1.27
  • OS (e.g. from /etc/os-release):

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions