Skip to content

Conversation

@jschaul
Copy link
Member

@jschaul jschaul commented Jul 13, 2021

This PR sets a reasonably secure TLS configuration for the GRPC client that federator uses to connect to other federators (or rather, nginx ingresses, assuming a wire-server instance on the other side).

Summary of changes

  • Enabled TLS validation and specifying a CA when creating a GRPC client
  • Added new configuration options for federator and corresponding docs (docs/reference/config-options.md)
  • Added integration test for the GRPC client against a local nginx playing the role of ingress
  • Changed end2end tests so that certificates for the ingress are created on the fly (using correct domains)
  • Added unit tests for the GRPC client using a mock http2/TLS server

Remaining questions

  • Do we need to support TLS 1.2?
  • Should we use a list of "blessed" ciphers?

Checklist

  • The PR Title explains the impact of the change.
  • The PR description provides context as to why the change should occur and what the code contributes to that effect. This could also be a link to a JIRA ticket or a Github issue, if there is one.
  • If end-points have been added or changed: the endpoint / config-flag checklist (see Wire-employee only backend wiki page) has been followed.
  • If a schema migration has been added, I ran make git-add-cassandra-schema to update the cassandra schema documentation.
  • Section Unreleased of CHANGELOG.md contains the following bits of information:
    • A line with the title and number of the PR in one or more suitable sub-sections.
    • If /a: measures to be taken by instance operators.
    • If /a: list of cassandra migrations.
    • If public end-points have been changed or added: does nginz need upgrade?
    • If internal end-points have been added or changed: which services have to be deployed in a specific order?

@jschaul jschaul changed the title WIP: perform some TLS validation when talking to remote servers WIP: Federation: perform some TLS validation when talking to remote servers Jul 13, 2021
@jschaul
Copy link
Member Author

jschaul commented Jul 13, 2021

Interestingly, the end2end tests show weirdly-encoded errors as a 500 server error with just the first commit of this PR:

        executeSearchWithDomain, called at test/integration/API/Search/Util.hs:88:26 in main:API.Search.Util
        assertCanFindWithDomain, called at test/integration/Federation/End2end.hs:135:3 in main:Federation.End2end

    get users by ids on multiple backends:                                      FAIL
      Exception: Error executing request: HttpExceptionRequest Request {
        host                 = "brig.test-izcvydyodvs4.svc.cluster.local"
        port                 = 8080
        secure               = False
        requestHeaders       = [("Z-User","91fa9030-7347-4572-8992-9d1aebf3e1f9"),("Content-Type","application/json"),("Content-Type","application/json"),("Accept","application/json")]
        path                 = "list-users"
        queryString          = ""
        method               = "POST"
        proxy                = Nothing
        rawBody              = False
        redirectCount        = 10
        responseTimeout      = ResponseTimeoutDefault
        requestVersion       = HTTP/1.1
      }

       (StatusCodeException (Response {responseStatus = Status {statusCode = 500, statusMessage = "Internal Server Error"},
 responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Tue, 13 Jul 2021 14:12:07 GMT"),("Server","Warp/3.3.13"),("Content-Type","application/json")], 
responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}) "{\"code\":500,\"message
\":\"\\u001f\239\191\189\\u0008\\u0000\\u0000\\u0000\\u0000\\u0000\\u0004\\u0003\239\191\189\239\191\189Mk\\u00021
\\u0010\239\191\189\239\191\189J\239\191\189\239\191\189\239\191\189f\239\191\189\239\191\189^\\u0016<\\u0014\239\191
\189\\u001f\208\138\239\191\189=uK\\u0019\239\191\189\239\191\189M\239\191\189&K2k\239\191\189J\239\191\189{\239
\191\189\239\191\189\\\"\239\191\189;\239\191\189\&0y\239\191\189y\239\191\189=\239\191\189FF(\\u000e\239\191\189 WP\239\191\189hC\239\191\189\\u0002\239\191\189\239\191\189n\239\191\189%\239\191\189m\239\191\189\\u0010\229\186\147FG\239\191\189@\239\191\189\\u001a\239\191\189K\239\191\189\\u0019\239\191\189L\239\191\189eE\239\191\189\239\191\189\239\191\189\\u001f\239\191\189\239\191\189K\239\191\189;\239\191\189y\239\191\189\239\191\189\239\191\189\&2\239\191\189Wy\220\171\\\\\239\191\189\&6rB\239\191\189Wh\239\191\189\239\191\189\239\191\189\239\191\189.D\239\191\189\239\191\189\239\191\189\239\191\189:-o\239\191\189\239\191\189\\u000cj\239\191\189\\u0011\239\191\189=\239\191\189\\r\239\191\189\\u0012\\u0014\239\191\189\\u000f\239\191\189\239\191\189_.\239\191\189\\\"2r\\u001b\239\191\189q\239\191\189(L\239\191\189b\239\191\189\198\182\239\191\189\239\191\189\211\143\239\191\189\&5\239\191\189\239\191\189\239\191\189j\239\191\189\239\191\189\239\191\189>e\DEL\219\147qR\239\191\189\\u0003:\\u001d+\239\191\189\239\191\189]:$-\\u0006\239\191\189\239\191\189\239\191\189\239\191\189\\u0008\239\191\189\239\191\189\239\191\189V\\u000c\202\178\\u0004E\239\191\189\239\191\189\239\191\189\239\191\189 \\u0002}\239\191\189b\210\133x\239\191\189cM\239\191\189&\239\191\189\200\170\\u0012\239\191\189\239\191\189\239\191\189\239\191\189u\228\191\154\\u0012\239\191\189zY\239\191\189\\n-e\239\191\189s\239\191\189\239\191\189\239\191\189\&9\239\191\189\239\191\189\239\191\189\239\191\189\\u0012Rq\\u0016\239\191\189d/\239\191\189\239\191\189\239\191\189 y,\\u0008\239\191\189\DEL\\u0000F\239\191\189\239\191\189\239\191\189\239\191\189\\u0001\\u0000\\u0000\",\"label\":\"server-error\"}")
      CallStack (from HasCallStack):
        error, called at src/Bilge/Assert.hs:99:18 in bilge-0.22.0-KwPXBVYHFlF3OFbmYHfwv3:Bilge.Assert
        <!!, called at src/Bilge/Assert.hs:107:19 in bilge-0.22.0-KwPXBVYHFlF3OFbmYHfwv3:Bilge.Assert
        !!!, called at test/integration/Federation/End2end.hs:143:3 in main:Federation.End2end

@jschaul jschaul changed the title WIP: Federation: perform some TLS validation when talking to remote servers WIP: Federation: perform some TLS validation when talking to remote servers [skip ci] Jul 14, 2021
@jschaul jschaul force-pushed the certificates-improvements branch from 17288d9 to 81c03e8 Compare July 14, 2021 18:44
jschaul and others added 5 commits July 14, 2021 21:03
The `catchError` middleware was running over gzipped responses, and this
caused the wrapping logic to fail, and to display a gzip-encoded error
message.
This prevents the response body to being preemptively converted to Text
when wrapping an error response inside a JSON object.
There is no need to use STM in `lazyResponseBody`, since it is not
interacting with any other STM action, and just using TVars inside
`atomically` blocks.
@pcapriotti pcapriotti mentioned this pull request Jul 16, 2021
10 tasks
This should fix the error reporting and show what the actual problem is.
@pcapriotti pcapriotti changed the title WIP: Federation: perform some TLS validation when talking to remote servers [skip ci] WIP: Federation: perform some TLS validation when talking to remote servers Jul 16, 2021
jschaul and others added 3 commits July 19, 2021 16:01
The HTTP manager should only be used to perform requests to internal
services, so we do not need to set up SSL for it.
 - Add configuration option to use system CA store
 - Propagate error when the custom CA store path cannot be loaded
 - Add ability to use both a custom and a system CA store
@pcapriotti pcapriotti force-pushed the certificates-improvements branch from f5d21c5 to 8c0890c Compare July 21, 2021 08:39
Comment on lines +1 to +14
apiVersion: v1
kind: Secret
metadata:
name: "federator-ca"
labels:
wireService: federator
chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
data:
# TODO: add validation and fail early during templating: either contents should be provided; or explicitly system trust store enabled
{{- if .Values.remoteCAContents }}
remote-ca.pem: {{ .Values.remoteCAContents | b64enc | quote }}
{{- end }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jschaul Why does this have to be a secret? It is the public key anyway. I think we can merge it into the configmap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this refactoring the followup PR #1682

@pcapriotti pcapriotti marked this pull request as ready for review July 22, 2021 14:52
Copy link
Member Author

@jschaul jschaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good now, though there seem some CI failures still? We'll need to tweak the federation environments when this is merged; and see whether useSystemCAStore will work (I think it won't, since currently the grpc-ingress doesn't have let's encrypt properly configured)

requests: {}
imagePullPolicy: Always
config:
optSettings:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for CI, shouldn't this set useSystemCAStore to false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you're right. Also, should the default assignment useSystemCAStore: true reside in values.yaml or in the template?

1. `remoteCAStore`: Maybe Filepath. This config option can be used to specify
multiple certificates from either a single file (multiple PEM formatted
certificates concatenated) or directory (one certificate per file, file names
are hashes from certificate).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to provide an example yaml documentation block with the exact syntax and spelling. And ti would be good to list the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some examples in the followup PR #1682 (commit 011e83a2fa55e10554a825b70e2f9325487b57c2), since that also contains new options.

{{- if $.Values.remoteCAContents }}
remoteCAStore: "/etc/wire/federator/ca/remote-ca.pem"
{{- end }}
federationStrategy:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Remember to update the values in the federation anta/bella/chala environments when this PR gets merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should be fine in anta/bella/chala, as by default we use the system CA, and those certificates are supposed to be Let's Encrypt. However, it seems the Let's Encrypt certificates are not actually being used, so I guess there is a problem somewhere in the ingress configuration.

@pcapriotti pcapriotti merged commit d3db326 into develop Jul 27, 2021
@pcapriotti pcapriotti deleted the certificates-improvements branch July 27, 2021 07:53
pcapriotti added a commit that referenced this pull request Jul 28, 2021
pcapriotti added a commit that referenced this pull request Aug 4, 2021
It is currently not so easy to distinguish this particular error from a
generic TLS error (see #1662 for more context).

Since `InvalidCertificate` is never thrown, this PR simply removes it.
Note that this is a breaking change in the federation protobuf.
pcapriotti added a commit that referenced this pull request Sep 9, 2021
It is currently not so easy to distinguish this particular error from a
generic TLS error (see #1662 for more context).

Since `InvalidCertificate` is never thrown, this PR simply removes it.
Note that this is a breaking change in the federation protobuf.
pcapriotti added a commit that referenced this pull request Sep 9, 2021
* Document federation errors

Co-authored-by: Marko Dimjašević <[email protected]>

* Remove `InvalidCertificate` federation error

It is currently not so easy to distinguish this particular error from a
generic TLS error (see #1662 for more context).

Since `InvalidCertificate` is never thrown, this PR simply removes it.
Note that this is a breaking change in the federation protobuf.

* Remove labels from protobuf errors

* Improve federation error descriptions

Also suggest client behaviour in some cases.

Co-authored-by: Marko Dimjašević <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants