Migration guide from k8s-workload-registrar #40

faisal-memon · 2022-12-13T07:12:25Z

Provides steps for users of Kubernetes Workload Registrar to transition to SPIRE Controller Manager.

Uses spire namespace because thats what Kubernetes Workload Registrar users are likely using. Newer examples use spire-system. Should we steer users there?
The examples are all static yaml files. Can switch to kustomize if thats preferred.

migration/config/spire.spiffe.io_clusterfederatedtrustdomains.yaml

migration/config/spire.spiffe.io_clusterspiffeids.yaml

keeganwitt · 2023-01-09T17:44:21Z

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

migration/config/spire.spiffe.io_clusterfederatedtrustdomains.yaml

migration/config/spire.spiffe.io_clusterspiffeids.yaml

keeganwitt · 2023-01-09T18:02:51Z

migration/config/spire.spiffe.io_clusterspiffeids.yaml

+kind: CustomResourceDefinition
+metadata:
+  annotations:
+    controller-gen.kubebuilder.io/version: v0.8.0


Should this annotation be present?

its auto generated so id rather not remove it.

keeganwitt · 2023-01-09T18:02:56Z

migration/config/spire.spiffe.io_clusterfederatedtrustdomains.yaml

+kind: CustomResourceDefinition
+metadata:
+  annotations:
+    controller-gen.kubebuilder.io/version: v0.8.0


Should this annotation be present?

keeganwitt · 2023-01-09T18:11:45Z

~~I think this guide should mention what to do if you were creating "downstream" SPIFFEIDs before with the Registrar. Is this not supported yet?~~ EDIT: Filed #43 for this.

faisal-memon · 2023-01-09T20:36:11Z

~~I think this guide should mention what to do if you were creating "downstream" SPIFFEIDs before with the Registrar. Is this not supported yet?~~ EDIT: Filed #43 for this.

@keeganwitt Thanks for catching this. Ill update the FAQ to reflect that part is missing.

faisal-memon · 2023-01-09T21:40:13Z

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

@keeganwitt A lot of the config in here is copied from the demo/ and docs/ directories. The demo directory is a more complex set up with 2 clusters and I wanted this to be a more simple. Im thinking to push this forward as is so we can deprecate k8s-workload-registrar and a open a follow up PR to reduce duplication and clean up the docs. What do you think?

keeganwitt · 2023-01-10T16:08:29Z

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

@keeganwitt A lot of the config in here is copied from the demo/ and docs/ directories. The demo directory is a more complex set up with 2 clusters and I wanted this to be a more simple. Im thinking to push this forward as is so we can deprecate k8s-workload-registrar and a open a follow up PR to reduce duplication and clean up the docs. What do you think?

I'm in favor of merging it soon. Incomplete/imperfect documentation is better than none at all.

We won't move off Registrar until it supports downstream/parentId in the CRD. Otherwise, we have to manually set up downstream agents rather than leveraging our Helm deployment. Do you disagree that that level of feature parity should be present before official deprecation?

faisal-memon · 2023-01-10T19:22:27Z

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

@keeganwitt A lot of the config in here is copied from the demo/ and docs/ directories. The demo directory is a more complex set up with 2 clusters and I wanted this to be a more simple. Im thinking to push this forward as is so we can deprecate k8s-workload-registrar and a open a follow up PR to reduce duplication and clean up the docs. What do you think?

I'm in favor of merging it soon. Incomplete/imperfect documentation is better than none at all.

We won't move off Registrar until it supports downstream/parentId in the CRD. Otherwise, we have to manually set up downstream agents rather than leveraging our Helm deployment. Do you disagree that that level of feature parity should be present before official deprecation?

I think the downstream PR is a good one to merge prior to deprecating. In general I don't think we need full feature parity as long as we have alternatives/workarounds to at least move things forward. I try to cover this in the FAQ section of this PR. For example I don't think we need to support annotation based registration going forward. Is there a specific feature aside from downstream you think we need to support prior to deprecating k8s-workload-registrar?

keeganwitt · 2023-01-11T04:38:48Z

Do you think it's confusing to have 2 sets of configs in the project? One in the migration directory and one in demo? Should migration reference demo configs instead?

@keeganwitt A lot of the config in here is copied from the demo/ and docs/ directories. The demo directory is a more complex set up with 2 clusters and I wanted this to be a more simple. Im thinking to push this forward as is so we can deprecate k8s-workload-registrar and a open a follow up PR to reduce duplication and clean up the docs. What do you think?

I'm in favor of merging it soon. Incomplete/imperfect documentation is better than none at all.
We won't move off Registrar until it supports downstream/parentId in the CRD. Otherwise, we have to manually set up downstream agents rather than leveraging our Helm deployment. Do you disagree that that level of feature parity should be present before official deprecation?

I think the downstream PR is a good one to merge prior to deprecating. In general I don't think we need full feature parity as long as we have alternatives/workarounds to at least move things forward. I try to cover this in the FAQ section of this PR. For example I don't think we need to support annotation based registration going forward. Is there a specific feature aside from downstream you think we need to support prior to deprecating k8s-workload-registrar?

We don't need parity for features that are replaced with other concepts. We weren't using the annotation based registration before, but the guide should probably address that also (and any others you're aware of). The two features we'd like are the ability to specify downstream flag and the ability to specify parent ID.

sjberman · 2023-01-11T21:01:06Z

migration/config/spire-controller-manager-config.yaml

+      bindAddress: 127.0.0.1:8083
+    leaderElection:
+      leaderElect: true
+      resourceName: 98c9c988.spiffe.io


What is this name referring to?

Good question, its the name of the ConfigMap thats created to do leader election. Its defined here: https://pkg.go.dev/k8s.io/component-base/config/v1alpha1#LeaderElectionConfiguration

Will it always have this name?

You can use a different name if you like.

Oh I see, wasn't actually sure who generated that ConfigMap. Good to know that we control the name.

I did following in my Helm chart.

resourceName: {{ .Release.Name | sha256sum | trunc 8 }}.spiffe.io

sjberman · 2023-01-11T21:15:38Z

migration/README.md

+2022-12-13T00:41:21.844Z	INFO	webhook-manager	Webhook configuration patched with CABundle
+```
+
+### I'm using CRD mode Kubernetes Workload Registrar and it gets stuck deleting the SpiffeId CRD. What do I do?


Since the new CRDs also appear to have finalizers, could we run into the same issue with those as we did with the SpiffeID CRDs?

I see finalizer mentioned in the repo but I dont see it being attached to any resources as it was the k8s-workload-registrar. So I dont think you'll run into that issue. @azdagron can you confirm?

I wasn't sure, because I saw it in the RBAC permissions.

Those were added by default with kubebuilder. We don't currently make use of finalizers.

sjberman · 2023-01-11T22:50:57Z

migration/README.md

+
+```
+
+If you require these DNS Names to be automatically populated, please open a [new issue](https://github.com/spiffe/spire-controller-manager/issues/new) with your use case.


Issue created for this: #48

Updated the readme with this ticket. Is this a showstopper for migrating nginx service mesh?

Yes, due to the unpredictability of how a user may choose to label their Pods. I also created a second blocking issue that was discovered while testing, #49, though is unrelated to this guide.

Bump into the same and commented on the issue to give my 50 cents. #48 (comment)

migration/README.md

marcofranssen · 2023-01-17T14:31:33Z

FYI: Using this Guide I was able to build the spire-controller-manager into this Helm chart
philips-labs/helm-charts#102

migration/README.md

azdagron · 2023-02-03T14:32:30Z

migration/README.md

+
+## Introduction
+
+This guide will walk you through how to migrate an existing Kubernetes Workload Registrar deployment to SPIRE Controller Manager. Existing entries created by the Kubernetes Workload Registrar aren't compatible with SPIRE Controller Manager so they'll be deleted and replaced with new entries. Workloads will continue to function with the old entries until their certificates expire, after which they'll get new certificates based on the new entries.


Workloads will continue to function with the old entries until their certificates expire

This isn't quite accurate. Workloads should continue to function, but the SVIDs will be actively rotated and pushed down to workloads as soon as the Agent receives the new entries.

azdagron · 2023-02-03T14:40:33Z

migration/README.md

+> **Note**
+> As we'll be deleting and creating entries, it's important to do this migration during a downtime window.
+
+## Clean up Kubernetes Workload Registrar Resources


When the CRDs are deleted, and the accompanying entries, workloads will see PermissionDenied returned from the Workload API. Spec conformant workloads will interpret this as "I don't have any SVIDs" and SHOULD proactively unload/stop using previously received materials. Doing this step before the controller manager has created new entries will cause a service interruption for these workloads.

See the last paragraph under https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE_Workload_API.md#521-fetchx509svid.

We might need a way to stop the registrar, deploy the controller manager, and then somehow remove the CRDs. The controller manager will remove the entries.

If we need to add something to the registrar to help facilitate this (like a "dont create new entries" mode) that still allows finalizer and cleanup to run, we can add that.

Thats going to be tricky to have them running side by side. Perhaps we can just strip the finalizers by patching the CRDs after the registrar is removed?

That could work too. If there is a straightforward way to do that that seems easier than adding a flag to the k8s-workload-registrar.

This script does the trick:

for ns in $(kubectl get ns | awk '{print $1}' | tail -n +2) do if [ $(kubectl get spiffeids -n $ns 2>/dev/null | wc -l) -ne 0 ] then kubectl patch spiffeid $(kubectl get spiffeids -n $ns | awk '{print $1}' | tail -n +2) --type='merge' -p '{"metadata":{"finalizers":null}}' -n $ns fi

migration/README.md

azdagron · 2023-02-03T16:04:04Z

migration/README.md

+```
+
+> **Note**
+> This will create entries for every Pod in the system. Its better to restrict it with a label like in the main example in `config/clusterspiffeid.yaml`. Also see [ClusterSPIFFEID defintion](https://github.com/spiffe/spire-controller-manager/blob/main/docs/clusterspiffeid-crd.md) for more information.


Its better to restrict it

I think possibly a better approach would be to just explain the pros and cons and let folks decide for themselves which is better according to their circumstances. There are valid scenarios where you don't need to restrict by label.

Updated note.

migration/README.md

jaellio · 2023-02-06T22:01:50Z

migration/config/spire-controller-manager-config.yaml

+    ignoreNamespaces:
+      - kube-system
+      - kube-public
+      - istio-system


Potentially remove Istio-specific references to avoid confusion

I think its ok to have in this case as we don't want to generate certs for core Istio services.

My thought was since Istio isn't mentioned anywhere else in the migration guide and I don't think it's a requirement for the migration guide it might be confusing to include. I think it is totally fine to leave in, just a suggestion :).

azdagron

Looks great. Thanks @faisal-memon !

azdagron · 2023-02-14T19:49:52Z

Just need DCO fixed up and we can merge this.

Signed-off-by: Faisal Memon <[email protected]>

Co-authored-by: Marco Franssen <[email protected]> Signed-off-by: Faisal Memon <[email protected]>

Signed-off-by: Faisal Memon <[email protected]>

Signed-off-by: Faisal Memon <[email protected]> Co-authored-by: Marco Franssen <[email protected]> Signed-off-by: Faisal Memon <[email protected]>

Signed-off-by: Faisal Memon <[email protected]>

faisal-memon · 2023-02-14T21:16:16Z

Just need DCO fixed up and we can merge this.

DCO fixed.

faisal-memon requested review from azdagron and MarcosDY as code owners December 13, 2022 07:12