Skip to content

Conversation

jschaul
Copy link
Member

@jschaul jschaul commented Mar 9, 2023

Main change:

New 'ingress-nginx-controller' wrapper chart compatible with kubernetes versions [1.23 - 1.26]. The old one 'nginx-ingress-controller' (compatible only up to k8s 1.19) is now DEPRECATED.
We advise to upgrade your version of kubernetes in use to 1.23 or higher (we tested on kubernetes version 1.26), and to make use of the new ingress controller chart. Main features:

  • up-to-date nginx version ('1.21.6')
  • TLS 1.3 support (including allowing specifying which cipher suites to use)
  • security fixes
  • no more accidental logging of Wire access tokens under specific circumstances jira ticket

The 'kind: Ingress' resources installed via 'nginx-ingress-services' chart remain compatible with both the old and the new ingress controller, and k8s versions [1.18 - 1.26]. In case you upgrade an existing kubernetes cluster (not recommended), you may need to first uninstall the old controller before installing the new controller chart.

In case you have custom overrides, you need to modify the directory name and top-level configuration key:

# If you have overrides for the controller chart (such as cipher suites), ensure to rename file and top-level key:
-# nginx-ingress-controller/values.yaml
+# ingress-nginx-controller/values.yaml
-nginx-ingress:
+ingress-nginx:
   controller:
     # ...

and double-check if all overrides you use are indeed provided under the same name by the upstream chart. See also the default overrides in the default values.yaml.

In case you use helmfile change your ingress controller like this:

# helmfile.yaml
releases:
-  - name: 'nginx-ingress-controller'
+  - name: 'ingress-nginx-controller'
     namespace: 'wire'
-    chart: 'wire/nginx-ingress-controller'
+    chart: 'wire/ingress-nginx-controller'
     version: 'CHANGE_ME'

For more information read the documentation under https://docs.wire.com/how-to/install/ingress.html (or go to https://docs.wire.com and search for "ingress-nginx-controller")

Other internal changes:

  • integration tests on CI will use either the old or the new ingress controller; depending on which kubernetes version they run on.
  • upgrade kubectl to default from the nixpkgs channel (currently 1.26) by removing the manual version pin on 1.19
  • upgrade helmfile to default from the nixpkgs channel by removing the manual version pin
  • upgrade helm to default from the nixpkgs channel by removing the manual version pin
  • add kubelogin-oidc so the kubectl in this environment can also talk to kubernetes clusters using OIDC

Checklist

  • Add a new entry in an appropriate subdirectory of changelog.d
  • Read and follow the PR guidelines

@zebot zebot added the ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist label Mar 9, 2023
@jschaul jschaul force-pushed the new-ingress-controller-chart branch 4 times, most recently from a4020bc to 0340b10 Compare March 16, 2023 14:42
@jschaul jschaul marked this pull request as ready for review March 16, 2023 15:56
Copy link
Contributor

@supersven supersven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job 👍

@jschaul jschaul force-pushed the new-ingress-controller-chart branch from 8b6db95 to db3e342 Compare March 21, 2023 12:25
@jschaul
Copy link
Member Author

jschaul commented Mar 21, 2023

CI tests now finally pass; both on a 1.19 cluster kube-ci-legacy, and (tested manually) on a 1.26 cluster (new kube-ci). Just keeping this PR hanging for potential feedback by others in the reviewer list.

On 1.26 kube-ci using the new chart wrapper:

===============
=== summary ===
===============
=== tail galley: ===

All 416 tests passed (77.44s)
=== tail cargohold: ===

All 21 tests passed (8.11s)
=== tail gundeck: ===

All 34 tests passed (155.61s)
=== tail spar: ===
Finished in 254.9702 seconds
553 examples, 0 failures, 65 pending
=== tail federator: ===
Finished in 0.4622 seconds
9 examples, 0 failures
=== tail brig: ===

All 491 tests passed (92.45s)
galley-integration passed ✅.
cargohold-integration passed ✅.
gundeck-integration passed ✅.
federator-integration passed ✅.
spar-integration passed ✅.
brig-integration passed ✅.
All integration tests passed ✅.

Following this PR, as well as #3154; I think we could also test using only TLS1.3 and test using ciphers with bigger key sizes.

@akshaymankar
Copy link
Member

The federator client doesn't support TLS 1.3 yet. It requires implementing a c function in HsOpenSSL.

Copy link
Contributor

@comawill comawill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 👍 LGTM

Co-authored-by: Sebastian Willenborg <[email protected]>
Comment on lines 10 to 21
# -- Use a `DaemonSet` or `Deployment`
kind: DaemonSet
service:
type: NodePort # or LoadBalancer
externalTrafficPolicy: Local
nodePorts:
# The nginx instance is exposed on ports 31773 (https) and 31772 (http)
# on the node on which it runs. You should add a port-forwarding rule
# on the node or on the loadbalancer that forwards ports 443 and 80 to
# these respective ports.
https: 31773
http: 31772
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more opportunistic default here would be to use type: Deployment and type: LoadBalancer, and document how to configure this if you have to resort to NodePort.

The NodePort approach always requires manual configuration of some external load balancer/firewall to round-robin between node IPs and is error-prone. It's also a bit annoying to have to decide on some global ports that may not be used otherwise.

Most managed K8s clusters have support for LoadBalancers, you can also get this for your own clusters in hcloud etc. It's even possible to do it for pure bare metal, without any "load balancer hardware", by using BGP or some leadership election over who's announcing the "load balancer ip" via ARP (https://metallb.universe.tf/configuration/_advanced_l2_configuration/).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swapped the defaults as you suggested and documented how to get the previous behaviour back.

@jschaul jschaul merged commit 94808d1 into develop Mar 22, 2023
@jschaul jschaul deleted the new-ingress-controller-chart branch March 22, 2023 14:38
@jschaul jschaul mentioned this pull request Mar 22, 2023
6 tasks
supersven pushed a commit that referenced this pull request Mar 22, 2023
## Main change:

New 'ingress-nginx-controller' wrapper chart compatible with kubernetes versions [1.23 - 1.26]. The old one 'nginx-ingress-controller' (compatible only up to k8s 1.19) is now DEPRECATED.
We advise to upgrade your version of kubernetes in use to 1.23 or higher (we tested on kubernetes version 1.26), and to make use of the new ingress controller chart. Main features:
- up-to-date nginx version ('1.21.6')
- TLS 1.3 support (including allowing specifying which cipher suites to use)
- security fixes
- no more accidental logging of Wire access tokens under specific circumstances [jira ticket](
https://wearezeta.atlassian.net/browse/SEC-47)

The 'kind: Ingress' resources installed via 'nginx-ingress-services' chart remain compatible with both the old and the new ingress controller, and k8s versions [1.18 - 1.26]. In case you upgrade an existing kubernetes cluster (not recommended), you may need to first uninstall the old controller before installing the new controller chart.

In case you have custom overrides, you need to modify the directory name and top-level configuration key:

```diff
# If you have overrides for the controller chart (such as cipher suites), ensure to rename file and top-level key:
-# nginx-ingress-controller/values.yaml
+# ingress-nginx-controller/values.yaml
-nginx-ingress:
+ingress-nginx:
   controller:
     # ...
```

and double-check if all overrides you use are indeed provided under the same name by the upstream chart. See also the default overrides in [the default values.yaml](https://github.com/wireapp/wire-server/blob/develop/charts/ingress-nginx-controller/values.yaml).

In case you use helmfile change your ingress controller like this:

```diff
# helmfile.yaml
releases:
-  - name: 'nginx-ingress-controller'
+  - name: 'ingress-nginx-controller'
     namespace: 'wire'
-    chart: 'wire/nginx-ingress-controller'
+    chart: 'wire/ingress-nginx-controller'
     version: 'CHANGE_ME'
```

For more information read the documentation under https://docs.wire.com/how-to/install/ingress.html (or go to https://docs.wire.com and search for "ingress-nginx-controller")

## Other internal changes:

- integration tests on CI will use either the old or the new ingress controller; depending on which kubernetes version they run on.
- upgrade `kubectl` to default from the nixpkgs channel (currently `1.26`) by removing the manual version pin on 1.19
- upgrade `helmfile` to default from the nixpkgs channel by removing the manual version pin
- upgrade `helm` to default from the nixpkgs channel by removing the manual version pin
- add `kubelogin-oidc` so the kubectl in this environment can also talk to kubernetes clusters using OIDC
You should add a port-forwarding rule on the node or on the loadbalancer that
forwards ports 443 and 80 to these respective ports. Any traffic hitting the http port is simply getting a http 30x redirect to https.

Downsides of this approach: The NodePort approach always requires manual configuration of some external load balancer/firewall to round-robin between node IPs and is error-prone. It's also a bit annoying to have to decide on some global ports that may not be used otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now do set some ports for the NodePort approach (31773 and 31772), even though we don't use them as long as the user doesn't explicitly set type: NodePort. We might want to remove them from values.yaml, and explicitly let the user decide (and put it in the example in L63), so this paragraph makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. #3173


Downsides of this approach: The NodePort approach always requires manual configuration of some external load balancer/firewall to round-robin between node IPs and is error-prone. It's also a bit annoying to have to decide on some global ports that may not be used otherwise.

Most managed K8s clusters have support for LoadBalancers, you can also get this for your own clusters in hcloud etc. It's even possible to do it for pure bare metal, without any "load balancer hardware", by using BGP or some leadership election over who's announcing the "load balancer ip" via ARP (https://metallb.universe.tf/configuration/_advanced_l2_configuration/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay for a github discussion, but a bit too sloppy for public docs.

What about

Most managed K8s clusters have support for LoadBalancers.
Manually set up K8S Clusters can also support this, by using a provider/environment-specific CCM
(see hcloud and digitalocean for examples).
In case you're provisioning on bare metal, without any hardware load balancer support in front,
you might be using MetalLB, which supports BGP or Failover L2 ARP announcements.
The choice of CCM highly depends on the environment you choose to deploy wire-server in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jschaul added a commit that referenced this pull request Mar 28, 2023
… too (#3138)

Fix a rate-limit exemption whereby authenticated endpoints did not get the unlimited_requests_endpoint, if set, applied. This is a concern for the webapp and calls to /assets, which can happen in larger numbers on initial loading. A previous change in [this PR](#2786) had no effect. 

This PR also increases default rate limits, to compensate for [new ingress controller chart](#3140 default topologyAwareRouting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants