Skip to content

Conversation

faisal-memon
Copy link
Contributor

Provides steps for users of Kubernetes Workload Registrar to transition to SPIRE Controller Manager.

  • Uses spire namespace because thats what Kubernetes Workload Registrar users are likely using. Newer examples use spire-system. Should we steer users there?
  • The examples are all static yaml files. Can switch to kustomize if thats preferred.

Addresses: spiffe/spire#3614

@keeganwitt
Copy link
Contributor

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.8.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this annotation be present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its auto generated so id rather not remove it.

kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.8.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this annotation be present?

@keeganwitt
Copy link
Contributor

keeganwitt commented Jan 9, 2023

I think this guide should mention what to do if you were creating "downstream" SPIFFEIDs before with the Registrar. Is this not supported yet? EDIT: Filed #43 for this.

@faisal-memon
Copy link
Contributor Author

I think this guide should mention what to do if you were creating "downstream" SPIFFEIDs before with the Registrar. Is this not supported yet? EDIT: Filed #43 for this.

@keeganwitt Thanks for catching this. Ill update the FAQ to reflect that part is missing.

@faisal-memon
Copy link
Contributor Author

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

@keeganwitt A lot of the config in here is copied from the demo/ and docs/ directories. The demo directory is a more complex set up with 2 clusters and I wanted this to be a more simple. Im thinking to push this forward as is so we can deprecate k8s-workload-registrar and a open a follow up PR to reduce duplication and clean up the docs. What do you think?

@keeganwitt
Copy link
Contributor

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

@keeganwitt A lot of the config in here is copied from the demo/ and docs/ directories. The demo directory is a more complex set up with 2 clusters and I wanted this to be a more simple. Im thinking to push this forward as is so we can deprecate k8s-workload-registrar and a open a follow up PR to reduce duplication and clean up the docs. What do you think?

I'm in favor of merging it soon. Incomplete/imperfect documentation is better than none at all.

We won't move off Registrar until it supports downstream/parentId in the CRD. Otherwise, we have to manually set up downstream agents rather than leveraging our Helm deployment. Do you disagree that that level of feature parity should be present before official deprecation?

@faisal-memon
Copy link
Contributor Author

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

@keeganwitt A lot of the config in here is copied from the demo/ and docs/ directories. The demo directory is a more complex set up with 2 clusters and I wanted this to be a more simple. Im thinking to push this forward as is so we can deprecate k8s-workload-registrar and a open a follow up PR to reduce duplication and clean up the docs. What do you think?

I'm in favor of merging it soon. Incomplete/imperfect documentation is better than none at all.

We won't move off Registrar until it supports downstream/parentId in the CRD. Otherwise, we have to manually set up downstream agents rather than leveraging our Helm deployment. Do you disagree that that level of feature parity should be present before official deprecation?

I think the downstream PR is a good one to merge prior to deprecating. In general I don't think we need full feature parity as long as we have alternatives/workarounds to at least move things forward. I try to cover this in the FAQ section of this PR. For example I don't think we need to support annotation based registration going forward. Is there a specific feature aside from downstream you think we need to support prior to deprecating k8s-workload-registrar?

@keeganwitt
Copy link
Contributor

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

@keeganwitt A lot of the config in here is copied from the demo/ and docs/ directories. The demo directory is a more complex set up with 2 clusters and I wanted this to be a more simple. Im thinking to push this forward as is so we can deprecate k8s-workload-registrar and a open a follow up PR to reduce duplication and clean up the docs. What do you think?

I'm in favor of merging it soon. Incomplete/imperfect documentation is better than none at all.
We won't move off Registrar until it supports downstream/parentId in the CRD. Otherwise, we have to manually set up downstream agents rather than leveraging our Helm deployment. Do you disagree that that level of feature parity should be present before official deprecation?

I think the downstream PR is a good one to merge prior to deprecating. In general I don't think we need full feature parity as long as we have alternatives/workarounds to at least move things forward. I try to cover this in the FAQ section of this PR. For example I don't think we need to support annotation based registration going forward. Is there a specific feature aside from downstream you think we need to support prior to deprecating k8s-workload-registrar?

We don't need parity for features that are replaced with other concepts. We weren't using the annotation based registration before, but the guide should probably address that also (and any others you're aware of). The two features we'd like are the ability to specify downstream flag and the ability to specify parent ID.

bindAddress: 127.0.0.1:8083
leaderElection:
leaderElect: true
resourceName: 98c9c988.spiffe.io

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this name referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, its the name of the ConfigMap thats created to do leader election. Its defined here: https://pkg.go.dev/k8s.io/component-base/config/v1alpha1#LeaderElectionConfiguration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it always have this name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a different name if you like.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, wasn't actually sure who generated that ConfigMap. Good to know that we control the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did following in my Helm chart.

resourceName: {{ .Release.Name | sha256sum | trunc 8 }}.spiffe.io

2022-12-13T00:41:21.844Z INFO webhook-manager Webhook configuration patched with CABundle
```

### I'm using CRD mode Kubernetes Workload Registrar and it gets stuck deleting the SpiffeId CRD. What do I do?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new CRDs also appear to have finalizers, could we run into the same issue with those as we did with the SpiffeID CRDs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see finalizer mentioned in the repo but I dont see it being attached to any resources as it was the k8s-workload-registrar. So I dont think you'll run into that issue. @azdagron can you confirm?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure, because I saw it in the RBAC permissions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those were added by default with kubebuilder. We don't currently make use of finalizers.


```

If you require these DNS Names to be automatically populated, please open a [new issue](https://github.com/spiffe/spire-controller-manager/issues/new) with your use case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue created for this: #48

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the readme with this ticket. Is this a showstopper for migrating nginx service mesh?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, due to the unpredictability of how a user may choose to label their Pods. I also created a second blocking issue that was discovered while testing, #49, though is unrelated to this guide.

Copy link
Contributor

@marcofranssen marcofranssen Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump into the same and commented on the issue to give my 50 cents. #48 (comment)

@marcofranssen
Copy link
Contributor

FYI: Using this Guide I was able to build the spire-controller-manager into this Helm chart
philips-labs/helm-charts#102


## Introduction

This guide will walk you through how to migrate an existing Kubernetes Workload Registrar deployment to SPIRE Controller Manager. Existing entries created by the Kubernetes Workload Registrar aren't compatible with SPIRE Controller Manager so they'll be deleted and replaced with new entries. Workloads will continue to function with the old entries until their certificates expire, after which they'll get new certificates based on the new entries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workloads will continue to function with the old entries until their certificates expire

This isn't quite accurate. Workloads should continue to function, but the SVIDs will be actively rotated and pushed down to workloads as soon as the Agent receives the new entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

> **Note**
> As we'll be deleting and creating entries, it's important to do this migration during a downtime window.

## Clean up Kubernetes Workload Registrar Resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the CRDs are deleted, and the accompanying entries, workloads will see PermissionDenied returned from the Workload API. Spec conformant workloads will interpret this as "I don't have any SVIDs" and SHOULD proactively unload/stop using previously received materials. Doing this step before the controller manager has created new entries will cause a service interruption for these workloads.

See the last paragraph under https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE_Workload_API.md#521-fetchx509svid.

We might need a way to stop the registrar, deploy the controller manager, and then somehow remove the CRDs. The controller manager will remove the entries.

If we need to add something to the registrar to help facilitate this (like a "dont create new entries" mode) that still allows finalizer and cleanup to run, we can add that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats going to be tricky to have them running side by side. Perhaps we can just strip the finalizers by patching the CRDs after the registrar is removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work too. If there is a straightforward way to do that that seems easier than adding a flag to the k8s-workload-registrar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script does the trick:

for ns in $(kubectl get ns | awk '{print $1}' | tail -n +2)
do
  if [ $(kubectl get spiffeids -n $ns 2>/dev/null | wc -l) -ne 0 ]
  then
    kubectl patch spiffeid $(kubectl get spiffeids -n $ns | awk '{print $1}' | tail -n +2) --type='merge' -p '{"metadata":{"finalizers":null}}' -n $ns
  fi

```

> **Note**
> This will create entries for every Pod in the system. Its better to restrict it with a label like in the main example in `config/clusterspiffeid.yaml`. Also see [ClusterSPIFFEID defintion](https://github.com/spiffe/spire-controller-manager/blob/main/docs/clusterspiffeid-crd.md) for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its better to restrict it

I think possibly a better approach would be to just explain the pros and cons and let folks decide for themselves which is better according to their circumstances. There are valid scenarios where you don't need to restrict by label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated note.

ignoreNamespaces:
- kube-system
- kube-public
- istio-system
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially remove Istio-specific references to avoid confusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok to have in this case as we don't want to generate certs for core Istio services.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was since Istio isn't mentioned anywhere else in the migration guide and I don't think it's a requirement for the migration guide it might be confusing to include. I think it is totally fine to leave in, just a suggestion :).

azdagron
azdagron previously approved these changes Feb 14, 2023
Copy link
Member

@azdagron azdagron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks @faisal-memon !

@azdagron
Copy link
Member

Just need DCO fixed up and we can merge this.

faisal-memon and others added 19 commits February 14, 2023 13:15
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Co-authored-by: Marco Franssen <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Co-authored-by: Marco Franssen <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>

Co-authored-by: Marco Franssen <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
Signed-off-by: Faisal Memon <[email protected]>
@faisal-memon
Copy link
Contributor Author

Just need DCO fixed up and we can merge this.

DCO fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants