Skip to content

Conversation

@u-kai
Copy link
Contributor

@u-kai u-kai commented Jun 27, 2025

What does it do ?

This PR resolves RBAC permission issues when using Gateway API sources with namespaced: true configuration in the Helm chart.
It implements proper conditional RBAC creation that supports both same-namespace and cross-namespace gateway access scenarios while maintaining backward compatibility.

Motivation

Fixes #5300 - Gateway API sources require ClusterRole permissions when using namespaced: true, but the current implementation creates insufficient Role permissions, causing external-dns to fail with RBAC errors.

Problem: When namespaced: true is set with gateway sources, external-dns needs:

  • Namespace informer access (ClusterRole for namespaces resource)
  • Gateway resource access (varies based on gatewayNamespace configuration)

Root Cause: The namespace informer uses NamespacesFromSelector functionality which requires cluster-wide namespace access, but namespaced: true only creates Role permissions.

Solution

Implements Split RBAC approach with conditional logic:

Scenarios Supported:

  1. namespaced=false + gateway sources → ClusterRole with all permissions
  2. namespaced=true + gateway sources + no gatewayNamespace → Main Role (with gateway permissions) + ClusterRole for namespaces
  3. namespaced=true + gateway sources + gatewayNamespace specified → Main Role + ClusterRole for namespaces + Cross-namespace Gateway Role
  4. namespaced=false/true + no gateway sources → Standard behavior (unchanged)

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. chart labels Jun 27, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 27, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @u-kai. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 27, 2025
Copy link
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the correct solution. Why gateway have a distinct flag, when rest of the sources rely on --namespace ? Is there is a specific reason or a mistake?

- --namespace={{ .Release.Namespace }}
{{- end }}
{{- if .Values.gatewayNamespace }}
- --gateway-namespace={{ .Values.gatewayNamespace }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the issues is, that Gateway actually uses --gateway-namespace when it should be unified and --namespace is just enough. This is just a confusion.

@u-kai
Copy link
Contributor Author

u-kai commented Jun 27, 2025

@ivankatliarchuk
Thanks for the feedback!

The separate --gateway-namespace flag aligns with Gateway API's design for cross-namespace routing, where Gateways (often managed by cluster operators) and Routes (managed by app teams) typically reside in different namespaces.

The flag already exists in the external-dns, so this change maintains consistency with the existing CLI interface while fixing the RBAC permissions for namespaced deployments.

@ivankatliarchuk
Copy link
Member

ivankatliarchuk commented Jun 28, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 28, 2025
@ivankatliarchuk
Copy link
Member

I found an initial PR #2292

As short term it seems like a fix. But not clear

Long term solutions are

In my opinion, if --namespace is set, only resouces in this exact namespace should be watched. If there are cross-namespace resource references (it should be ignored with a message that the resource is outside of current namespace scope), it should be either all namespaces or we need to add multiple namespace support.

@ivankatliarchuk
Copy link
Member

Related issues:

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 29, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 29, 2025
@u-kai
Copy link
Contributor Author

u-kai commented Jun 29, 2025

@ivankatliarchuk
I agree that supporting multiple namespaces with the --namespace option would be a valuable improvement in the long term.

That said, I’d like to clarify one point — does this proposal also imply deprecating or integrating the --gateway-namespace option?

If so, my personal opinion is that --gateway-namespace still has value and should remain.
When supporting multiple namespaces, we’ll likely need to create one informer per namespace. Without a --gateway-namespace option, we wouldn't know which namespace the Gateway actually exists in, which could result in unnecessary informer overhead, especially when the Gateway is deployed in a single, shared namespace (as is often the case in common Gateway setups).

In such cases, having a dedicated --gateway-namespace provides a useful optimization and avoids wasteful resource watching.

So in conclusion, I’m in favor of enabling multi-namespace support for --namespace, but I think retaining --gateway-namespace is also beneficial from a performance and configuration clarity standpoint.

@ivankatliarchuk
Copy link
Member

@mloiseleur wdyt?

@mloiseleur
Copy link
Collaborator

In External DNS doc on flags, it says:

  • --namespace: Limit resources queried for endpoints to a specific namespace (default: all namespaces)
  • --gateway-namespace: Limit Gateways of Route endpoints to a specific namespace (default: all namespaces)

With current state of External DNS, this PR looks valid to me. It allows user to use the same namespace or different namespace, with similar names between the binary and the chart. For instance, a user may want to use external dns CRD on external-dns namespace and Gateway on gateway namespaces.

@ivankatliarchuk An answer to your idea would be to implement a --namespaces options, allowing to set multiple namespaces. Then it would make sense to remove specific namespace options tailored for a specific source. But that's clearly beyond the scope of this PR.

Copy link
Collaborator

@mloiseleur mloiseleur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@u-kai About the implementation, why are you adding a specific ClusterRole and ClusterRoleBinding ? Wouldn't it be simpler to use the same CR & CRB with just extended required permissions ?

@ivankatliarchuk
Copy link
Member

Make sense

@u-kai
Copy link
Contributor Author

u-kai commented Jul 4, 2025

@mloiseleur
Thank you for the feedback!

Let me explain the implementation. For namespaced: true configurations, multiple Role/ClusterRole resources are necessary for the following reasons:

Technical Requirements:

  1. Base Permissions - When namespaced: true, typically a Role (not ClusterRole) is created for namespace-scoped permissions
  2. Namespace Informer - When using Gateway API sources like HTTPRoute, cluster-wide namespace read access is required for the NamespacesFromSelector functionality
  3. Principle of Least Privilege - We should only grant permissions when actually needed:
  • namespaced: true → base permissions via Role
  • Gateway API sources → additional ClusterRole for namespace access only when used
  1. Cross-namespace Gateway - In practice, Gateways often exist in different namespaces than Routes (e.g., infrastructure namespace vs application namespace). When users desire this setup (e.g., gatewayNamespace: default), a separate Role is needed for cross-namespace access

Implementation Approach:

  • Base permissions: Role (when namespaced)
  • Namespace access: ClusterRole (only when Gateway sources are used)
  • Cross-namespace Gateway: Role in target namespace (only when specified)

This design grants exactly the permissions needed for each scenario while maintaining security isolation.

@mloiseleur
Copy link
Collaborator

/assign @stevehipwell
for review

Copy link
Contributor

@stevehipwell stevehipwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @u-kai. I've added a comment suggesting an improvement but as I'd like to include this in the next release we can leave that for the next time we need to make changes.

/approve

Check if any Gateway API sources are enabled
*/}}
{{- define "external-dns.hasGatewaySources" -}}
{{- if or (has "gateway-httproute" .Values.sources) (has "gateway-grpcroute" .Values.sources) (has "gateway-tlsroute" .Values.sources) (has "gateway-tcproute" .Values.sources) (has "gateway-udproute" .Values.sources) -}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use hasPrefix in a range loop so the code is less likely to need updating in the future?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stevehipwell

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 14, 2025
@stevehipwell
Copy link
Contributor

@mloiseleur @ivankatliarchuk could one of you please add the LGTM if you're happy with this?

Copy link
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit a270a32 into kubernetes-sigs:master Jul 14, 2025
15 checks passed
troll-os pushed a commit to FiligranHQ/external-dns that referenced this pull request Aug 28, 2025
…ubernetes-sigs#5578)

* fix(helm): resolve RBAC permissions for namespaced gateway sources

* feat(helm): add support for gateway namespace in RBAC configuration

* chore(helm): update docs and fix formatting issues

* fix(helm): revert README changes and add gatewayNamespace docs

* chore lint fmt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. chart cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--namespace still trying to read various cluster scope

5 participants