Skip to content

Spar Polysemy: SAML2 effect #1827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Oct 4, 2021
Merged

Spar Polysemy: SAML2 effect #1827

merged 18 commits into from
Oct 4, 2021

Conversation

isovector
Copy link
Contributor

@isovector isovector commented Sep 30, 2021

This PR adds some new effects, completely isolating us from the saml2-web-sso interface:

  • SAML2
  • Now
  • SparRoute

It also does some refactoring around the canonical interpretation of Spar, moving it into its own module --- necessary to break some cyclic dependencies, and a nice touch all around.

As of this change, all the necessary parts of saml2-web-sso are packaged as an effect, meaning we can aggressively remove instances on Spar. But these instances aren't going far, they're now part of the implementation inside of Spar.Sem.SAML2.SAML2WebSso --- at least until we polysemize saml2-web-sso also.

Checklist

  • The PR Title explains the impact of the change.
  • The PR description provides context as to why the change should occur and what the code contributes to that effect. This could also be a link to a JIRA ticket or a Github issue, if there is one.
  • A file with the changelog entry in one or more suitable sub-sections. The sub-sections are marked by directories inside changelog.d.

Remove undefineds

Interpreting is really hard

Interpret everything

wip

Add toggleCookie to SAML2

Add Now effect

get it compiling

build

Remove HasCreateUUID instance for Spar
@isovector isovector requested a review from fisx September 30, 2021 20:38
inspectOrBomb ins get_a = do
fa <- Blah $ saml2ToSaml2WebSso get_a
maybe
(error "saml2ToSaml2WebSso called with an uninspectable weaving functor")
Copy link
Contributor Author

@isovector isovector Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scary! This case is impossible under normal circumstances, but can be triggered if any of the monadic arguments to the SAML2 actions fail via Polysemy.Error.throw. A better solution here would be to instead throw an error of our own, but I wasn't sure which constructor to use --- due to a technical limitation in Polysemy, we can't rethrow.

What do you think is the right play here, @fisx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that this is entirely an artifact of the typeclass approach to saml2-web-sso, and will go away when we polysemize that library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many places are there with monadic arguments? can we refactor those away somehow? or would that be harder than polysemizing saml2-web-sso?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue are these two effects:

data SAML2 m a where
  AuthReq ::
    NominalDiffTime ->
    m Issuer ->
    IdPId ->
    SAML2 m (FormRedirect AuthnRequest)
  AuthResp ::
    Maybe TeamId ->
    m Issuer ->
    m URI ->
    (AuthnResponse -> AccessVerdict -> m resp) ->
    AuthnResponseBody ->
    SAML2 m resp

their m Blah arguments, in particular.

The tactics effect in Polysemy lets us lift functions a -> m b into functions f a -> m (f b), where f holds all of the accumulated state from other effects. In a world where saml2-web-sso were more flexible with the types of authreq and authresp, we could just push that f all the way through.

The behavior here is not surprising though. Imagine in everyday MTL land, where we pass throwError blah as the argument to authreq. That too would fail. The types in tactics just make it very clear (and, simultaneously, very obscure) to see where we have these sorts of data dependencies.

IMO, the solution here is to make this a throw instead of error. But I don't expect to see it, since authreq and authresp are only ever called called with SparRoute actions.

Life (and this interpreter) would be much easier if these were just pure values instead of monadic ones, which we could do without polysemizing saml2-web-sso --- though maybe some callers depend on this being monadic?

Base automatically changed from spar-no-monad-reader to develop October 1, 2021 01:47
result :: SAML.ResponseVerdict <- verdictHandler cky mbtid resp verdict
throwError $ SAML.CustomServant result
result :: SAML.ResponseVerdict <- runSparInSem $ verdictHandler cky mbtid resp verdict
throw @SparError $ SAML.CustomServant result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: when we get to cleaning up the error mess, I want to remove CustomServant from the library.

import Spar.Sem.SparRoute
import Wire.API.Routes.Public.Spar

-- TODO(sandy): Why is this instance not provided by SAML? Very rude!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(even though I don't think defining that in another place will help making this more readable...)

inspectOrBomb ins get_a = do
fa <- Blah $ saml2ToSaml2WebSso get_a
maybe
(error "saml2ToSaml2WebSso called with an uninspectable weaving functor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many places are there with monadic arguments? can we refactor those away somehow? or would that be harder than polysemizing saml2-web-sso?

Copy link
Contributor

@fisx fisx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor suggestions, you decide whether they make sense, and whether you want to act on them here or in a different PR.

Otherwise all good!

inspectOrBomb ins get_a = do
fa <- SPImpl $ saml2ToSaml2WebSso get_a
maybe
(SPImpl . throw @SparError $ SAML.CustomError $ SparInternalError "saml2ToSaml2WebSso called with an uninspectable weaving functor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can only possibly happen in the verdict handler, and I think it's fine. Everything thrown via the error effects (mtl or polysemy) is caught before this, and everything thrown elsewhere is a legitimate internal error.

@isovector isovector force-pushed the saml2-effect branch 2 times, most recently from 0156ccc to 586062f Compare October 4, 2021 16:56
@isovector isovector merged commit 741fc51 into develop Oct 4, 2021
@isovector isovector deleted the saml2-effect branch October 4, 2021 19:45
@julialongtin julialongtin mentioned this pull request Oct 29, 2021
julialongtin added a commit that referenced this pull request Oct 29, 2021
* Make federated connection functions work with qualified IDs (#1819)

* Add stub for remote connection creation

* Make connection DB functions work with Qualified

* Simplify name of createConnection

* Fix order of arguments in createConnection

* Do not assert on 1-1 conversation names

* Use Local newtype for some more local arguments

Co-authored-by: jschaul <[email protected]>

* Fix detail in stern online help (#1834)

* Spar Polysemy: SAML2 effect (#1827)

* Use Input effect instead of a MonadReader instance

* Remove ReaderT

* Fix package.yaml

* Changelog

* Review responses

* SAML work

Remove undefineds

Interpreting is really hard

Interpret everything

wip

Add toggleCookie to SAML2

Add Now effect

get it compiling

build

Remove HasCreateUUID instance for Spar

* Cleanup

* CanonicalInterpreter and necessary changes

* Rename to SPImpl

* Fake CI

* Another fake CI

* Use catch in polysemy

* Respond to review

* Changelog

* Apply suggestions from code review

Co-authored-by: fisx <[email protected]>

* Hi CI

* make format

Co-authored-by: fisx <[email protected]>

* Spar Polysemy: Fully polysemize Spar (#1833)

* Remove wrapMonadClientSem

Put it into the Cassandra interpreter instead

* Remove MonadIO instance

* Remove MonadError instance

* Remove ExceptT

* Remove Final IO from Spar

* Fix one use of undefined

* Reporter effect; NO MORE IO

* Remove the Spar newtype

* Remove Spar type

* Stylistic cleanup

* Changelog

* Weird rebase problem

* Review comments

* Use hs-certificate master (#1822)

* Use master branch of hs-certificate

The error handling fix
haskell-tls/hs-certificate#125 has been merged, so
we can just use the upstream master now, and later switch to the hackage
package once it is released.

* Servantify legacy addMember endpoint (#1838)

* Use helmfile's parallelism to speed up integration test setup time (#1805)

Motivation: decrease integration setup time, especially for the default two-backend setup. Make use of tooling used elsewhere, and use less of hacky bash scripts. See also https://wearezeta.atlassian.net/wiki/spaces/PS/pages/513573957/CI+runs+of+wire-server+state+and+possible+improvements for a discussion of other CI improvement opportunities.

This should save off about ~5 minutes of setup time for each CI run simply because all helm charts for both backends are now installed in parallel, rather than sequentially. (that is, `make kube-integration-setup` now should be faster than before this PR)

- Create a few FUTUREWORKS in Jira and link to them from the code comments
- Create two helmfiles, one for federation, one for single-backend
- Add helmfile to nix-shell tooling (Helmfile itself comes with a different version of helm; but since so
far things inside nix-shell are only in use for local development, this
should not matter too much. In the future this can be streamlined with
wire-server-deploy to use the same versions everywhere)

* [Federation] Include Remote Connections in Listing All Connections (#1826)

* Expand a test to also include remote connections while listing

* Remove deprecated endpoint for listing convs (#1840)

* Remove deprecated endpoint for listing convs

Also removed the V2 from the name of the endpoint (in the code, not in
the endpoint path).

* Remove /list-conversations from nginx conf

* Remove use of /list-conversations from End2end

* Federation: Allow connecting to remote users (#1824)

One2One conversations are not created yet. This will be worked upon separately.
Legal-hold restrictions are also not dealt with as for now, it will not be allowed to turn on legal-hold and federation at the same point.

Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: jschaul <[email protected]>
Co-authored-by: Akshay Mankar <[email protected]>

* Fix more swagger validation errors (#1841)

* Fix more swagger validation errors

These could be prevented by turning some lists to sets in the swagger2
package, but for now we simply go through all the schemas in the
`Swagger` structure, and apply `nub` on them.

* Various cleanups of Qualified and related types (#1839)

* Refactor tagged Qualified types

This makes the `Local` and `Remote` type constructor safer, because now
it is not possible to change the domain inside a tagged value using the
`Functor` instance.

* Rename `partitionQualified` to `indexQualified`

* Refactor partitionRemoteOrLocalIds

Also rename it to partitionQualified and swap the order of results.

* Refactor and rename `partitionRemote`

The `partitionRemote` function has been renamed to `indexRemote` for
consistency with `indexQualified`, and it now returns a list of `Remote
[a]`, which preserves the information about the domains being remote.

* Remove some uses of toRemoteUnsafe

* Remove convId from ConversationMetadata

Also change type of toRemoteUnsafe and toLocalUnsafe to just take a `Domain` and
an `a` instead of `Qualified a`.

* Remove one more use of toRemoteUnsafe

* Remove lUnqualified and lDomain

We can simply use the general versions that work for both qualified
tags.

* Remove renderQualified and corresponding test

It was completely unused.

* Use data kinds for Id tags

* Better schema instance for `Qualified` values

* Add CHANGELOG entry

* Create remote 1-1 conversations (#1825)

* Extract function to create UserList

* Add stub for remote 1-1 conversation creation

* Compute remote 1-1 conversation IDs

* ensureConnected now takes a UserList

* Make /conversations/one2one federation-aware

Converted the endpoint for creating 1-1 conversations to the new
conversation ID algorithm, and enabled the endpoint to create 1-1
conversations with federated users.

Note: the case when the conversation needs to be hosted by the remote
domain is still not implemented. We probably need a new RPC for this
case.

* Remove create from UUID Version class

The create function cannot be defined for all UUID versions.

* Introduce V5 UUIDs and use them for 1-1 conv

* Servantify internal endpoint for connect conv

* Make recipient field of connect event qualified

* Extract function to create legacy connect conv

* Add tests for the conversation ID algorithm

* write internal with stubs for data functions

* Implement a function for creating and updating a 1-1 remote conversation

- The function is Galley.API.One2One.iUpsertOne2OneConversation

* use schema-profunctor for json instances

galley-types: no lax

* galley-types rename module to Intra

* galley: remove "these" dep

galley.cabal

* fix impossible example

* remove todo

* un-nameclash: one2OneConvId -> localOne2OneConvId

* remove warning suppression

* brig: add rpc function

* change api: alwyas return a conv id

* Add tests for one2one conversation internal endpoint

* Test remote one2one conversation case

* Update golden tests after change in connect event

* Add CHANGELOG entry

* Remove incorrect comment

Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>

* Leave a note with a link to a Jira ticket about a flaky test (#1844)

* Make non-collision test for 1-1 conv ids faster (#1846)

The `anySame` function has quadratic runtime, but here we can use an
`Ord` instance, and just compare the `nubOrd` lists. This also removes a
potential flakyness caused by repeated input pairs (which should be
quite likely to happen, given the low entropy of the UUID generator).

* add comment to test for FUTUREWORK (#1848)

* Fix error in member csv creation (SAML.UserRef decoding error) (#1828)

* Add failing test case.

* Nit-pick.

* Do not git-ignore pem files (at least not all of them).

* Fix error message.

* More detail in scim error responses.

* An idea.

* Implement the idea.

* FUTUREWORK.

* Update One2One conversation when connection status changes (#1850)

* move one2oneConvId to galley-types

* implement updateOne2OneConv and simple test

* add more test cases

* Clarify 403 in test

* add changelog entry

* chore: [charts] Update webapp version (#1836)

Co-authored-by: Zebot <[email protected]>

* chore: [charts] Update team-settings version (#1835)

Co-authored-by: Zebot <[email protected]>

* update to latest SFT. (#1849)

* update to latest SFT.

* Add changelog entry for SFT

Co-authored-by: jschaul <[email protected]>

* Upgrade webapp/team-settings: changelog entries for #1835 and #1836 (#1856)

* Fix SFTD in umbrella chart (#1677)

* Fix SFTD in umbrella chart

* changelog

Co-authored-by: jschaul <[email protected]>

* Move SFTD public IP docs to the top (#1672)

It's the thing people confuse the most. Hopefully people will get it wrong less now

* [charts:sftd] Introduce flag to enable TURN discovery (#1519)

* [charts:sftd] Introduce flag to enable TURN discovery

* -f integrate review feedback

* changelog

Co-authored-by: jschaul <[email protected]>

* Check extended key usage of server certificates (#1855)

* Test that server key usage is checked for fed cert

* Reject certificates without server usage flag

* Access updates affect remote users (#1854)

* Rename NotificationTargets to BotsAndMembers

* Refactor logic to remove users after access update

 - Avoid using lenses and state; since there are only two updates, these
 can be threaded manually pretty easily.
 - Rename the `NotificationTargets` type to `BotsAndMembers`, and use
 that instead of pairs (or triples) in the access update function.

This endpoint is still not properly federation-aware, since remote
members are not removed, and local member removals are not propagated to
remotes.

Co-authored-by: Stefan Matting <[email protected]>

* Re-enable multiple victim when removing members

This is useful to batch removals occurring after an access update to a
conversation.

* Remove and notify remotes on access update

* Access update removal tests

* Remove duplication in test conversation creation

Co-authored-by: Paolo Capriotti <[email protected]>
Co-authored-by: Marko Dimjašević <[email protected]>

* Change tag (#1859)

* Check connections when adding remote users to a conv (#1842)

* Delete stale FUTUREWORK

* Brig: delete deprecated 'GET /i/users/connections-status` endpoint

* brig: Servantify POST /i/users/connection-status

* brig: Add internal endpoint to get qualified connection statuses

* Brig: Support creating accepted connections for tests

The endpoint just creates DB entries without actually contacting the remote
backend. This is very useful when galley tests need a remote connection to exist

* wire-api: roundtrip test for To/FromByteString @relation

The instances were deleted couple of commits ago.

* Check conn between adder and remotes when adding remotes to conv

* Check connection between conversation creator and remote members

* Do connection checking in onConversationCreated in the federation API

* Make existing federation tests succeed again by sprinkling some connections

* Add a (still failing) test for on-conversation-crated

* Add more connections to pass federation API tests

* onConvCreated: Ensure creator of conv is included as other member

* More coverage for onConvCreated

* onConvUpdated: Only allow connected users to add local users

* Add test case: Only unconnected users to add

* Fix integration tests

Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: jschaul <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: Paolo Capriotti <[email protected]>

* Make conversation creator unqualified in on-conversation-created RPC (#1858)

* Unqualify rcOrigId in `on-conversation-created`

Also add some Remote and Local tags to various functions.

* Simplify partitioning in onConversationCreated

* Improve comment about creator ID in RPC

* Ensure creator in the conv domain in tests

Co-authored-by: jschaul <[email protected]>

* Parallelise RPCs (#1860)

* Add runFederatedConcurrently utility

* Paralellise remote conversation notification

* Add Local and Remote tags to profile functions

* Parallelise RPCs for fetching profiles

* Rename indexRemote to bucketRemote

This makes it consistent with indexQualified and bucketQualified.

* Move traverseWithErrors to Util module

* Parallelise claimMultiPrekeyBundles

* Close GRPC client after making a request to a remote federator (#1865)

* Add Resource effect to InternalServer stack

* Ensure GRPC clients are closed after a request

* Allow using kind cluster with imagePullPolicy=Never (#1862)

* Allow using kind cluster with imagePullPolicy=Never

drive-by fix: create namespace if it doesn't exist yet

* Update helm version in nix-shell to fit version used elsewhere

* set kind kubeconfig permissions correctly

* fixup helmfile

* Hi CI

* disable flaky test in gundeck (#1867)

* disable flaky test in gundeck

* Hi CI

* Check connections when creating group and team convs with remote members  (#1870)

* Remove unnecessary remote domain from mock federator

* Remove unnecessary check for remote users' existence in createConv

Since we check for connections, we don't need to also find out if the users
exist.

* Check remote connections when creating team conv

Just like for regular group conversations, do not fetch profiles, and
instead check both local and remote connections.

Also added failure tests for team conversation creation with unconnected
locals or remotes.

* Remove opts argument for mock federator

* Add CHANGELOG entries

Co-authored-by: Paolo Capriotti <[email protected]>

* minor Readme: document usage of helm charts (#1307)

* Support deleting conversations with federated users (#1861)

* Refactor: Use pushConversationEvent

* add onConversationDeleted RPC

* deleteTeamConversation: rpc onConversationDeleted

* Data.deleteConversation: remove remotes

* add changelog entry

* wire-api: extend ConversationAction

* onConversationDeleted -> onConversationUpdated

* fix compilation

* remove duplicated import

* cosmetic change

* fix call to withTempServantMockFederator

* Remove a leftover TODO that was addressed (#1868)

* In Conversation Endpoints Make the members.self ID Qualified (#1866)

* Make the self member's ID qualified
* Simplify conversation view functions
* Unrelated small change: remove a cycle of qualifying a conversation ID in a test
* Introduce qualifyLocal to the BotNet monad

* Changelog script: skip empty sections (#1871)

* Replace shell.nix with a direnv + nixpkgs.buildEnv based setup (#1876)

* Replace shell.nix with a direnv + nixpkgs.buildEnv based setup

* Add instructions on how to use nix-hls.sh from emacs

* Correctly update PATH in .envrc (#1877)

* Introduce 'make flake-PATTERN' (#1875)

Add a 'make flake-PATTERN' target to run a subset of tests multiple times to trigger a failure case in flaky tests. By default the test(s) will run up to 1000 times until a failure occurs, at which point it will stop. Scrolling up on the output will show you how many tests had to run to trigger a failure.

example output:

```
make flake-sso-id
echo 'set -ex' > /tmp/flake.sh
chmod +x /tmp/flake.sh
for i in $(seq 1000); do \
	echo "echo $i" >> /tmp/flake.sh; \
	echo '../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p "sso-id" ' >> /tmp/flake.sh; \
done
INTEGRATION_USE_NGINZ=1 ../integration.sh /tmp/flake.sh
Running tests using mocked AWS services
[cannon] I, Listening on 127.0.0.1:8083
[cannon] I, Listening on 127.0.0.1:8183
[cargohold] I, Listening on 0.0.0.0:8084
[spar] I, logger=cassandra.spar, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
[federator] D, inotify initialized, inotify=<inotify fd=11>
[gundeck] I, Listening on 0.0.0.0:8086
[galley] I, Listening on 127.0.0.1:8085
[spar] I, Listening on 0.0.0.0:8088
[nginz] 127.0.0.1 - - [20/Oct/2021:16:33:50 +0200] "GET /i/status HTTP/1.1" 200 0 "-" "curl/7.71.1" "-" - 2 0.000 - - - - 3cabaf643c510db36a3c989301d73569
all services are up!
++ echo 1
1
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:51Z, D, Connecting to 127.0.0.1:9042
2021-10-20T14:33:51Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:51Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: OK (0.82s)

All 1 tests passed (0.83s)
++ echo 2
2
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:53Z, D, Connecting to 127.0.0.1:9042
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:33:53Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:53Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
OK (0.85s)

All 1 tests passed (0.85s)
++ echo 3
3
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:55Z, D, Connecting to 127.0.0.1:9042
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:33:55Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:55Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
OK (0.77s)

All 1 tests passed (0.77s)
++ echo 4
4
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:56Z, D, Connecting to 127.0.0.1:9042
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:33:56Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:56Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
OK (0.79s)

All 1 tests passed (0.79s)
++ echo 5
5
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
```

When a failure happens:

```
++ echo 282
282
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:41:25Z, D, Connecting to 127.0.0.1:9042
[brig] W, logger=cassandra.brig, Server warning: Read 0 live rows and 2102 tombstone cells for query SELECT * FROM brig_test.users_pending_activation WHERE  LIMIT 10000 (see tombstone_warn_threshold)
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:41:25Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:41:25Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
[brig] W, logger=cassandra.brig, Server warning: Read 0 live rows and 2104 tombstone cells for query SELECT * FROM brig_test.users_pending_activation WHERE  LIMIT 10000 (see tombstone_warn_threshold)
FAIL
        Exception: Assertions failed:
         1: 202 =/= 403
         2: updatePhone (PUT /self/phone): failed to update to Phone {fromPhone = "+046965171332989"} - might be a flaky test tracked in https://wearezeta.atlassian.net/browse/BE-526

        Response was:

        Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Wed, 20 Oct 2021 14:41:27 GMT"),("Server","Warp/3.3.13"),("Content-Encoding","gzip"),("Content-Type","application/json")], responseBody = Just "{\"code\":403,\"message\":\"The given phone number has been blacklisted due to suspected abuse or a complaint.\",\"label\":\"blacklisted-phone\"}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}
        CallStack (from HasCallStack):
          error, called at src/Bilge/Assert.hs:89:5 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          <!!, called at src/Bilge/Assert.hs:107:19 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          !!!, called at test/integration/Util.hs:735:3 in main:Util
          updatePhone, called at test/integration/API/User/Account.hs:1230:11 in main:API.User.Account

1 out of 1 tests failed (0.79s)
Terminated
Terminated
[brig] W, logger=cassandra.brig, Server warning: Read 0 live rows and 2106 tombstone cells for query SELECT * FROM brig_test.users_pending_activation WHERE  LIMIT 10000 (see tombstone_warn_threshold)
make: *** [Makefile:114: flake-sso-id] Error 1

```

* updatePhone deflake (#1874)

* updatePhone deflake debugging information

This is about https://wearezeta.atlassian.net/browse/BE-526

I think what's happening is that one test that tests the phone blocking
adds a record into the brig.excluded_phones entry. Then, another,
unrelated test, if unlucky enough to randomly generate a phone number
contained under that prefix, fails in the PUT /self/phone call.

* 1) update integration test output to give better information and link
  to a flaky test description
* 2) change the code to (hopefully) avoid this flake to re-occur.

The changes to integration tests will lead to the following output on
failure:

  user
    account
      put /i/users/:uid/sso-id:
        Exception: Assertions failed:
         1: 202 =/= 403
         2: updatePhone (PUT /self/phone): failed to update to Phone {fromPhone = "+046965171332989"} - might be a flaky test tracked in https://wearezeta.atlassian.net/browse/BE-526

        Response was:

        Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Wed, 20 Oct 2021 14:41:27 GMT"),("Server","Warp/3.3.13"),("Content-Encoding","gzip"),("Content-Type","application/json")], responseBody = Just "{\"code\":403,\"message\":\"The given phone number has been blacklisted due to suspected abuse or a complaint.\",\"label\":\"blacklisted-phone\"}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}
        CallStack (from HasCallStack):
          error, called at src/Bilge/Assert.hs:89:5 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          <!!, called at src/Bilge/Assert.hs:107:19 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          !!!, called at test/integration/Util.hs:735:3 in main:Util
          updatePhone, called at test/integration/API/User/Account.hs:1230:11 in main:API.User.Account

* undo changes in src as they make another test fail

* Add a cleanup line

* fixup

* Hi CI

* Include conv creator is only once in notifications sent to remotes (#1879)

To remove any confusion in the `on-conversation-created` federation API, rename
"members" to "non_creator_members". As the creator is already specified in
"orig_user_id".

Also:
- Add Golden tests for `NewRemoteConversation`
- Add integration tests for creating conversation with remote users

* Optimise remote user deletion (#1872)

Creates two Federation RPCs:

* In brig: on-user-deleted, notify about the connections in chunks of 1000 users.
* In galley: on-user-deleted, notify about the conversations in chunks 1000 conversations

When writing integration tests in brig, we can mock the federator for brig but not galley. As the two RPCs must be made from two separate places. So, we had to mock out galley to be able to test the brig functionality. The galley functionality is tested separately by calling the internal endpoint.

Co-authored-by: Akshay Mankar <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>

* Set federator's default log level to Info (#1882)

* Rename the two federation/on-user-deleted endpoints (#1883)

* Update Federation API conventions doc in prep for on-user-deleted

* brig/galley: Rename the two federation/on-user-deleted endpoints

This is to ensure that they do not overlap. This will hopefully make it easier
to merge brig and galley.

* Extract type level vars for UserDeleteNotificationMax{Conns,Convs}

* Galley polysemy (1/5) - Introduce Sem and "access" effects (#1881)

* Add type variable to Galley monad

This is step 0 in the process of converting galley to effects. We
introduce a phantom type variable `r` in the `Galley` monad, which will
later be used for the effect row.

* Use API instead of DB access in 1-1 conv test

* Monomorphise Data functions

* Avoid MonadUnliftIO in Bilge.RPC

* Remove unneeded MonadLogger constraint

* Introduce fine-grained placeholder effects

This commit introduces several placeholder effects, mostly having to do
with making HTTP requests. All the existing uses of `MonadUnliftIO` are
now either gone, or hidden behind of of these effects, and that made it
possible to get rid of the `MonadUnliftIO` instance for `Galley`.

Also, the `Galley0` type synomym now refers to `Galley` without any
effects, so `runGalley` and related functions now take a `Galley
GalleyEffects` instead.

`Galley0` still has a `MonadUnliftIO` instance, so it can be used as a
temporary crutch to get access to async primitives. Those need to be run
in `Galley0`, and finally lifted to a general `Galley r` monad.
Eventually, the `Galley0` actions will simply be replaced by effect
actions, and the code actually using `MonadUnliftIO` will be relegated
to interpreters.

* Remove MonadMask instance of Galley

This also introduces a `SparAccess` effect and adds a few more
`BrigAccess` and `BotAccess` constraints.

* Remove MonadCatch instance of Galley

* Turn Galley into a Sem newtype

The underlying `Sem` monad in `Galley` is an arbitrary effect stack that
contains at least the effects which replicate the functionality of the
original `Galley` monad. All the functionality has been reimplemented in
terms of `Sem`, so the existing code does not need to be changed at all.

* Allow configuring nginz so it serves the deeplink for apps to discover the backend (#1889)

Allow nginz to serve a deeplink (see also https://docs.wire.com/how-to/associate/deeplink.html )

Co-authored-by: jschaul <[email protected]>

* upgrade webapp to federation-capable (not for production use!) version. (#1892)

* Release 2021_10_29

Co-authored-by: Paolo Capriotti <[email protected]>
Co-authored-by: jschaul <[email protected]>
Co-authored-by: fisx <[email protected]>
Co-authored-by: Sandy Maguire <[email protected]>
Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: Akshay Mankar <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: zebot <[email protected]>
Co-authored-by: Zebot <[email protected]>
Co-authored-by: Arian van Putten <[email protected]>
Co-authored-by: Lucendio <[email protected]>
isovector added a commit that referenced this pull request Nov 4, 2021
* Make federated connection functions work with qualified IDs (#1819)

* Add stub for remote connection creation

* Make connection DB functions work with Qualified

* Simplify name of createConnection

* Fix order of arguments in createConnection

* Do not assert on 1-1 conversation names

* Use Local newtype for some more local arguments

Co-authored-by: jschaul <[email protected]>

* Fix detail in stern online help (#1834)

* Spar Polysemy: SAML2 effect (#1827)

* Use Input effect instead of a MonadReader instance

* Remove ReaderT

* Fix package.yaml

* Changelog

* Review responses

* SAML work

Remove undefineds

Interpreting is really hard

Interpret everything

wip

Add toggleCookie to SAML2

Add Now effect

get it compiling

build

Remove HasCreateUUID instance for Spar

* Cleanup

* CanonicalInterpreter and necessary changes

* Rename to SPImpl

* Fake CI

* Another fake CI

* Use catch in polysemy

* Respond to review

* Changelog

* Apply suggestions from code review

Co-authored-by: fisx <[email protected]>

* Hi CI

* make format

Co-authored-by: fisx <[email protected]>

* Spar Polysemy: Fully polysemize Spar (#1833)

* Remove wrapMonadClientSem

Put it into the Cassandra interpreter instead

* Remove MonadIO instance

* Remove MonadError instance

* Remove ExceptT

* Remove Final IO from Spar

* Fix one use of undefined

* Reporter effect; NO MORE IO

* Remove the Spar newtype

* Remove Spar type

* Stylistic cleanup

* Changelog

* Weird rebase problem

* Review comments

* Use hs-certificate master (#1822)

* Use master branch of hs-certificate

The error handling fix
https://github.com/vincenthz/hs-certificate/pull/125 has been merged, so
we can just use the upstream master now, and later switch to the hackage
package once it is released.

* Servantify legacy addMember endpoint (#1838)

* Use helmfile's parallelism to speed up integration test setup time (#1805)

Motivation: decrease integration setup time, especially for the default two-backend setup. Make use of tooling used elsewhere, and use less of hacky bash scripts. See also https://wearezeta.atlassian.net/wiki/spaces/PS/pages/513573957/CI+runs+of+wire-server+state+and+possible+improvements for a discussion of other CI improvement opportunities.

This should save off about ~5 minutes of setup time for each CI run simply because all helm charts for both backends are now installed in parallel, rather than sequentially. (that is, `make kube-integration-setup` now should be faster than before this PR)

- Create a few FUTUREWORKS in Jira and link to them from the code comments
- Create two helmfiles, one for federation, one for single-backend
- Add helmfile to nix-shell tooling (Helmfile itself comes with a different version of helm; but since so
far things inside nix-shell are only in use for local development, this
should not matter too much. In the future this can be streamlined with
wire-server-deploy to use the same versions everywhere)

* [Federation] Include Remote Connections in Listing All Connections (#1826)

* Expand a test to also include remote connections while listing

* Remove deprecated endpoint for listing convs (#1840)

* Remove deprecated endpoint for listing convs

Also removed the V2 from the name of the endpoint (in the code, not in
the endpoint path).

* Remove /list-conversations from nginx conf

* Remove use of /list-conversations from End2end

* Federation: Allow connecting to remote users (#1824)

One2One conversations are not created yet. This will be worked upon separately.
Legal-hold restrictions are also not dealt with as for now, it will not be allowed to turn on legal-hold and federation at the same point.

Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: jschaul <[email protected]>
Co-authored-by: Akshay Mankar <[email protected]>

* Fix more swagger validation errors (#1841)

* Fix more swagger validation errors

These could be prevented by turning some lists to sets in the swagger2
package, but for now we simply go through all the schemas in the
`Swagger` structure, and apply `nub` on them.

* Various cleanups of Qualified and related types (#1839)

* Refactor tagged Qualified types

This makes the `Local` and `Remote` type constructor safer, because now
it is not possible to change the domain inside a tagged value using the
`Functor` instance.

* Rename `partitionQualified` to `indexQualified`

* Refactor partitionRemoteOrLocalIds

Also rename it to partitionQualified and swap the order of results.

* Refactor and rename `partitionRemote`

The `partitionRemote` function has been renamed to `indexRemote` for
consistency with `indexQualified`, and it now returns a list of `Remote
[a]`, which preserves the information about the domains being remote.

* Remove some uses of toRemoteUnsafe

* Remove convId from ConversationMetadata

Also change type of toRemoteUnsafe and toLocalUnsafe to just take a `Domain` and
an `a` instead of `Qualified a`.

* Remove one more use of toRemoteUnsafe

* Remove lUnqualified and lDomain

We can simply use the general versions that work for both qualified
tags.

* Remove renderQualified and corresponding test

It was completely unused.

* Use data kinds for Id tags

* Better schema instance for `Qualified` values

* Add CHANGELOG entry

* Create remote 1-1 conversations (#1825)

* Extract function to create UserList

* Add stub for remote 1-1 conversation creation

* Compute remote 1-1 conversation IDs

* ensureConnected now takes a UserList

* Make /conversations/one2one federation-aware

Converted the endpoint for creating 1-1 conversations to the new
conversation ID algorithm, and enabled the endpoint to create 1-1
conversations with federated users.

Note: the case when the conversation needs to be hosted by the remote
domain is still not implemented. We probably need a new RPC for this
case.

* Remove create from UUID Version class

The create function cannot be defined for all UUID versions.

* Introduce V5 UUIDs and use them for 1-1 conv

* Servantify internal endpoint for connect conv

* Make recipient field of connect event qualified

* Extract function to create legacy connect conv

* Add tests for the conversation ID algorithm

* write internal with stubs for data functions

* Implement a function for creating and updating a 1-1 remote conversation

- The function is Galley.API.One2One.iUpsertOne2OneConversation

* use schema-profunctor for json instances

galley-types: no lax

* galley-types rename module to Intra

* galley: remove "these" dep

galley.cabal

* fix impossible example

* remove todo

* un-nameclash: one2OneConvId -> localOne2OneConvId

* remove warning suppression

* brig: add rpc function

* change api: alwyas return a conv id

* Add tests for one2one conversation internal endpoint

* Test remote one2one conversation case

* Update golden tests after change in connect event

* Add CHANGELOG entry

* Remove incorrect comment

Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>

* Leave a note with a link to a Jira ticket about a flaky test (#1844)

* Make non-collision test for 1-1 conv ids faster (#1846)

The `anySame` function has quadratic runtime, but here we can use an
`Ord` instance, and just compare the `nubOrd` lists. This also removes a
potential flakyness caused by repeated input pairs (which should be
quite likely to happen, given the low entropy of the UUID generator).

* add comment to test for FUTUREWORK (#1848)

* Fix error in member csv creation (SAML.UserRef decoding error) (#1828)

* Add failing test case.

* Nit-pick.

* Do not git-ignore pem files (at least not all of them).

* Fix error message.

* More detail in scim error responses.

* An idea.

* Implement the idea.

* FUTUREWORK.

* Update One2One conversation when connection status changes (#1850)

* move one2oneConvId to galley-types

* implement updateOne2OneConv and simple test

* add more test cases

* Clarify 403 in test

* add changelog entry

* chore: [charts] Update webapp version (#1836)

Co-authored-by: Zebot <[email protected]>

* chore: [charts] Update team-settings version (#1835)

Co-authored-by: Zebot <[email protected]>

* update to latest SFT. (#1849)

* update to latest SFT.

* Add changelog entry for SFT

Co-authored-by: jschaul <[email protected]>

* Upgrade webapp/team-settings: changelog entries for #1835 and #1836 (#1856)

* Fix SFTD in umbrella chart (#1677)

* Fix SFTD in umbrella chart

* changelog

Co-authored-by: jschaul <[email protected]>

* Move SFTD public IP docs to the top (#1672)

It's the thing people confuse the most. Hopefully people will get it wrong less now

* [charts:sftd] Introduce flag to enable TURN discovery (#1519)

* [charts:sftd] Introduce flag to enable TURN discovery

* -f integrate review feedback

* changelog

Co-authored-by: jschaul <[email protected]>

* Check extended key usage of server certificates (#1855)

* Test that server key usage is checked for fed cert

* Reject certificates without server usage flag

* Access updates affect remote users (#1854)

* Rename NotificationTargets to BotsAndMembers

* Refactor logic to remove users after access update

 - Avoid using lenses and state; since there are only two updates, these
 can be threaded manually pretty easily.
 - Rename the `NotificationTargets` type to `BotsAndMembers`, and use
 that instead of pairs (or triples) in the access update function.

This endpoint is still not properly federation-aware, since remote
members are not removed, and local member removals are not propagated to
remotes.

Co-authored-by: Stefan Matting <[email protected]>

* Re-enable multiple victim when removing members

This is useful to batch removals occurring after an access update to a
conversation.

* Remove and notify remotes on access update

* Access update removal tests

* Remove duplication in test conversation creation

Co-authored-by: Paolo Capriotti <[email protected]>
Co-authored-by: Marko Dimjašević <[email protected]>

* Change tag (#1859)

* Check connections when adding remote users to a conv (#1842)

* Delete stale FUTUREWORK

* Brig: delete deprecated 'GET /i/users/connections-status` endpoint

* brig: Servantify POST /i/users/connection-status

* brig: Add internal endpoint to get qualified connection statuses

* Brig: Support creating accepted connections for tests

The endpoint just creates DB entries without actually contacting the remote
backend. This is very useful when galley tests need a remote connection to exist

* wire-api: roundtrip test for To/FromByteString @Relation

The instances were deleted couple of commits ago.

* Check conn between adder and remotes when adding remotes to conv

* Check connection between conversation creator and remote members

* Do connection checking in onConversationCreated in the federation API

* Make existing federation tests succeed again by sprinkling some connections

* Add a (still failing) test for on-conversation-crated

* Add more connections to pass federation API tests

* onConvCreated: Ensure creator of conv is included as other member

* More coverage for onConvCreated

* onConvUpdated: Only allow connected users to add local users

* Add test case: Only unconnected users to add

* Fix integration tests

Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: jschaul <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: Paolo Capriotti <[email protected]>

* Make conversation creator unqualified in on-conversation-created RPC (#1858)

* Unqualify rcOrigId in `on-conversation-created`

Also add some Remote and Local tags to various functions.

* Simplify partitioning in onConversationCreated

* Improve comment about creator ID in RPC

* Ensure creator in the conv domain in tests

Co-authored-by: jschaul <[email protected]>

* Parallelise RPCs (#1860)

* Add runFederatedConcurrently utility

* Paralellise remote conversation notification

* Add Local and Remote tags to profile functions

* Parallelise RPCs for fetching profiles

* Rename indexRemote to bucketRemote

This makes it consistent with indexQualified and bucketQualified.

* Move traverseWithErrors to Util module

* Parallelise claimMultiPrekeyBundles

* Close GRPC client after making a request to a remote federator (#1865)

* Add Resource effect to InternalServer stack

* Ensure GRPC clients are closed after a request

* Allow using kind cluster with imagePullPolicy=Never (#1862)

* Allow using kind cluster with imagePullPolicy=Never

drive-by fix: create namespace if it doesn't exist yet

* Update helm version in nix-shell to fit version used elsewhere

* set kind kubeconfig permissions correctly

* fixup helmfile

* Hi CI

* disable flaky test in gundeck (#1867)

* disable flaky test in gundeck

* Hi CI

* Check connections when creating group and team convs with remote members  (#1870)

* Remove unnecessary remote domain from mock federator

* Remove unnecessary check for remote users' existence in createConv

Since we check for connections, we don't need to also find out if the users
exist.

* Check remote connections when creating team conv

Just like for regular group conversations, do not fetch profiles, and
instead check both local and remote connections.

Also added failure tests for team conversation creation with unconnected
locals or remotes.

* Remove opts argument for mock federator

* Add CHANGELOG entries

Co-authored-by: Paolo Capriotti <[email protected]>

* minor Readme: document usage of helm charts (#1307)

* Support deleting conversations with federated users (#1861)

* Refactor: Use pushConversationEvent

* add onConversationDeleted RPC

* deleteTeamConversation: rpc onConversationDeleted

* Data.deleteConversation: remove remotes

* add changelog entry

* wire-api: extend ConversationAction

* onConversationDeleted -> onConversationUpdated

* fix compilation

* remove duplicated import

* cosmetic change

* fix call to withTempServantMockFederator

* Remove a leftover TODO that was addressed (#1868)

* In Conversation Endpoints Make the members.self ID Qualified (#1866)

* Make the self member's ID qualified
* Simplify conversation view functions
* Unrelated small change: remove a cycle of qualifying a conversation ID in a test
* Introduce qualifyLocal to the BotNet monad

* Changelog script: skip empty sections (#1871)

* Replace shell.nix with a direnv + nixpkgs.buildEnv based setup (#1876)

* Replace shell.nix with a direnv + nixpkgs.buildEnv based setup

* Add instructions on how to use nix-hls.sh from emacs

* Correctly update PATH in .envrc (#1877)

* Introduce 'make flake-PATTERN' (#1875)

Add a 'make flake-PATTERN' target to run a subset of tests multiple times to trigger a failure case in flaky tests. By default the test(s) will run up to 1000 times until a failure occurs, at which point it will stop. Scrolling up on the output will show you how many tests had to run to trigger a failure.

example output:

```
make flake-sso-id
echo 'set -ex' > /tmp/flake.sh
chmod +x /tmp/flake.sh
for i in $(seq 1000); do \
	echo "echo $i" >> /tmp/flake.sh; \
	echo '../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p "sso-id" ' >> /tmp/flake.sh; \
done
INTEGRATION_USE_NGINZ=1 ../integration.sh /tmp/flake.sh
Running tests using mocked AWS services
[cannon] I, Listening on 127.0.0.1:8083
[cannon] I, Listening on 127.0.0.1:8183
[cargohold] I, Listening on 0.0.0.0:8084
[spar] I, logger=cassandra.spar, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
[federator] D, inotify initialized, inotify=<inotify fd=11>
[gundeck] I, Listening on 0.0.0.0:8086
[galley] I, Listening on 127.0.0.1:8085
[spar] I, Listening on 0.0.0.0:8088
[nginz] 127.0.0.1 - - [20/Oct/2021:16:33:50 +0200] "GET /i/status HTTP/1.1" 200 0 "-" "curl/7.71.1" "-" - 2 0.000 - - - - 3cabaf643c510db36a3c989301d73569
all services are up!
++ echo 1
1
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:51Z, D, Connecting to 127.0.0.1:9042
2021-10-20T14:33:51Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:51Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: OK (0.82s)

All 1 tests passed (0.83s)
++ echo 2
2
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:53Z, D, Connecting to 127.0.0.1:9042
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:33:53Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:53Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
OK (0.85s)

All 1 tests passed (0.85s)
++ echo 3
3
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:55Z, D, Connecting to 127.0.0.1:9042
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:33:55Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:55Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
OK (0.77s)

All 1 tests passed (0.77s)
++ echo 4
4
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:56Z, D, Connecting to 127.0.0.1:9042
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:33:56Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:56Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
OK (0.79s)

All 1 tests passed (0.79s)
++ echo 5
5
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
```

When a failure happens:

```
++ echo 282
282
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:41:25Z, D, Connecting to 127.0.0.1:9042
[brig] W, logger=cassandra.brig, Server warning: Read 0 live rows and 2102 tombstone cells for query SELECT * FROM brig_test.users_pending_activation WHERE  LIMIT 10000 (see tombstone_warn_threshold)
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:41:25Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:41:25Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
[brig] W, logger=cassandra.brig, Server warning: Read 0 live rows and 2104 tombstone cells for query SELECT * FROM brig_test.users_pending_activation WHERE  LIMIT 10000 (see tombstone_warn_threshold)
FAIL
        Exception: Assertions failed:
         1: 202 =/= 403
         2: updatePhone (PUT /self/phone): failed to update to Phone {fromPhone = "+046965171332989"} - might be a flaky test tracked in https://wearezeta.atlassian.net/browse/BE-526

        Response was:

        Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Wed, 20 Oct 2021 14:41:27 GMT"),("Server","Warp/3.3.13"),("Content-Encoding","gzip"),("Content-Type","application/json")], responseBody = Just "{\"code\":403,\"message\":\"The given phone number has been blacklisted due to suspected abuse or a complaint.\",\"label\":\"blacklisted-phone\"}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}
        CallStack (from HasCallStack):
          error, called at src/Bilge/Assert.hs:89:5 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          <!!, called at src/Bilge/Assert.hs:107:19 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          !!!, called at test/integration/Util.hs:735:3 in main:Util
          updatePhone, called at test/integration/API/User/Account.hs:1230:11 in main:API.User.Account

1 out of 1 tests failed (0.79s)
Terminated
Terminated
[brig] W, logger=cassandra.brig, Server warning: Read 0 live rows and 2106 tombstone cells for query SELECT * FROM brig_test.users_pending_activation WHERE  LIMIT 10000 (see tombstone_warn_threshold)
make: *** [Makefile:114: flake-sso-id] Error 1

```

* updatePhone deflake (#1874)

* updatePhone deflake debugging information

This is about https://wearezeta.atlassian.net/browse/BE-526

I think what's happening is that one test that tests the phone blocking
adds a record into the brig.excluded_phones entry. Then, another,
unrelated test, if unlucky enough to randomly generate a phone number
contained under that prefix, fails in the PUT /self/phone call.

* 1) update integration test output to give better information and link
  to a flaky test description
* 2) change the code to (hopefully) avoid this flake to re-occur.

The changes to integration tests will lead to the following output on
failure:

  user
    account
      put /i/users/:uid/sso-id:
        Exception: Assertions failed:
         1: 202 =/= 403
         2: updatePhone (PUT /self/phone): failed to update to Phone {fromPhone = "+046965171332989"} - might be a flaky test tracked in https://wearezeta.atlassian.net/browse/BE-526

        Response was:

        Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Wed, 20 Oct 2021 14:41:27 GMT"),("Server","Warp/3.3.13"),("Content-Encoding","gzip"),("Content-Type","application/json")], responseBody = Just "{\"code\":403,\"message\":\"The given phone number has been blacklisted due to suspected abuse or a complaint.\",\"label\":\"blacklisted-phone\"}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}
        CallStack (from HasCallStack):
          error, called at src/Bilge/Assert.hs:89:5 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          <!!, called at src/Bilge/Assert.hs:107:19 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          !!!, called at test/integration/Util.hs:735:3 in main:Util
          updatePhone, called at test/integration/API/User/Account.hs:1230:11 in main:API.User.Account

* undo changes in src as they make another test fail

* Add a cleanup line

* fixup

* Hi CI

* Include conv creator is only once in notifications sent to remotes (#1879)

To remove any confusion in the `on-conversation-created` federation API, rename
"members" to "non_creator_members". As the creator is already specified in
"orig_user_id".

Also:
- Add Golden tests for `NewRemoteConversation`
- Add integration tests for creating conversation with remote users

* Optimise remote user deletion (#1872)

Creates two Federation RPCs:

* In brig: on-user-deleted, notify about the connections in chunks of 1000 users.
* In galley: on-user-deleted, notify about the conversations in chunks 1000 conversations

When writing integration tests in brig, we can mock the federator for brig but not galley. As the two RPCs must be made from two separate places. So, we had to mock out galley to be able to test the brig functionality. The galley functionality is tested separately by calling the internal endpoint.

Co-authored-by: Akshay Mankar <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>

* Set federator's default log level to Info (#1882)

* Rename the two federation/on-user-deleted endpoints (#1883)

* Update Federation API conventions doc in prep for on-user-deleted

* brig/galley: Rename the two federation/on-user-deleted endpoints

This is to ensure that they do not overlap. This will hopefully make it easier
to merge brig and galley.

* Extract type level vars for UserDeleteNotificationMax{Conns,Convs}

* Galley polysemy (1/5) - Introduce Sem and "access" effects (#1881)

* Add type variable to Galley monad

This is step 0 in the process of converting galley to effects. We
introduce a phantom type variable `r` in the `Galley` monad, which will
later be used for the effect row.

* Use API instead of DB access in 1-1 conv test

* Monomorphise Data functions

* Avoid MonadUnliftIO in Bilge.RPC

* Remove unneeded MonadLogger constraint

* Introduce fine-grained placeholder effects

This commit introduces several placeholder effects, mostly having to do
with making HTTP requests. All the existing uses of `MonadUnliftIO` are
now either gone, or hidden behind of of these effects, and that made it
possible to get rid of the `MonadUnliftIO` instance for `Galley`.

Also, the `Galley0` type synomym now refers to `Galley` without any
effects, so `runGalley` and related functions now take a `Galley
GalleyEffects` instead.

`Galley0` still has a `MonadUnliftIO` instance, so it can be used as a
temporary crutch to get access to async primitives. Those need to be run
in `Galley0`, and finally lifted to a general `Galley r` monad.
Eventually, the `Galley0` actions will simply be replaced by effect
actions, and the code actually using `MonadUnliftIO` will be relegated
to interpreters.

* Remove MonadMask instance of Galley

This also introduces a `SparAccess` effect and adds a few more
`BrigAccess` and `BotAccess` constraints.

* Remove MonadCatch instance of Galley

* Turn Galley into a Sem newtype

The underlying `Sem` monad in `Galley` is an arbitrary effect stack that
contains at least the effects which replicate the functionality of the
original `Galley` monad. All the functionality has been reimplemented in
terms of `Sem`, so the existing code does not need to be changed at all.

* Allow configuring nginz so it serves the deeplink for apps to discover the backend (#1889)

Allow nginz to serve a deeplink (see also https://docs.wire.com/how-to/associate/deeplink.html )

Co-authored-by: jschaul <[email protected]>

* upgrade webapp to federation-capable (not for production use!) version. (#1892)

* Release_2021_10_29 (#1893)

* Make federated connection functions work with qualified IDs (#1819)

* Add stub for remote connection creation

* Make connection DB functions work with Qualified

* Simplify name of createConnection

* Fix order of arguments in createConnection

* Do not assert on 1-1 conversation names

* Use Local newtype for some more local arguments

Co-authored-by: jschaul <[email protected]>

* Fix detail in stern online help (#1834)

* Spar Polysemy: SAML2 effect (#1827)

* Use Input effect instead of a MonadReader instance

* Remove ReaderT

* Fix package.yaml

* Changelog

* Review responses

* SAML work

Remove undefineds

Interpreting is really hard

Interpret everything

wip

Add toggleCookie to SAML2

Add Now effect

get it compiling

build

Remove HasCreateUUID instance for Spar

* Cleanup

* CanonicalInterpreter and necessary changes

* Rename to SPImpl

* Fake CI

* Another fake CI

* Use catch in polysemy

* Respond to review

* Changelog

* Apply suggestions from code review

Co-authored-by: fisx <[email protected]>

* Hi CI

* make format

Co-authored-by: fisx <[email protected]>

* Spar Polysemy: Fully polysemize Spar (#1833)

* Remove wrapMonadClientSem

Put it into the Cassandra interpreter instead

* Remove MonadIO instance

* Remove MonadError instance

* Remove ExceptT

* Remove Final IO from Spar

* Fix one use of undefined

* Reporter effect; NO MORE IO

* Remove the Spar newtype

* Remove Spar type

* Stylistic cleanup

* Changelog

* Weird rebase problem

* Review comments

* Use hs-certificate master (#1822)

* Use master branch of hs-certificate

The error handling fix
https://github.com/vincenthz/hs-certificate/pull/125 has been merged, so
we can just use the upstream master now, and later switch to the hackage
package once it is released.

* Servantify legacy addMember endpoint (#1838)

* Use helmfile's parallelism to speed up integration test setup time (#1805)

Motivation: decrease integration setup time, especially for the default two-backend setup. Make use of tooling used elsewhere, and use less of hacky bash scripts. See also https://wearezeta.atlassian.net/wiki/spaces/PS/pages/513573957/CI+runs+of+wire-server+state+and+possible+improvements for a discussion of other CI improvement opportunities.

This should save off about ~5 minutes of setup time for each CI run simply because all helm charts for both backends are now installed in parallel, rather than sequentially. (that is, `make kube-integration-setup` now should be faster than before this PR)

- Create a few FUTUREWORKS in Jira and link to them from the code comments
- Create two helmfiles, one for federation, one for single-backend
- Add helmfile to nix-shell tooling (Helmfile itself comes with a different version of helm; but since so
far things inside nix-shell are only in use for local development, this
should not matter too much. In the future this can be streamlined with
wire-server-deploy to use the same versions everywhere)

* [Federation] Include Remote Connections in Listing All Connections (#1826)

* Expand a test to also include remote connections while listing

* Remove deprecated endpoint for listing convs (#1840)

* Remove deprecated endpoint for listing convs

Also removed the V2 from the name of the endpoint (in the code, not in
the endpoint path).

* Remove /list-conversations from nginx conf

* Remove use of /list-conversations from End2end

* Federation: Allow connecting to remote users (#1824)

One2One conversations are not created yet. This will be worked upon separately.
Legal-hold restrictions are also not dealt with as for now, it will not be allowed to turn on legal-hold and federation at the same point.

Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: jschaul <[email protected]>
Co-authored-by: Akshay Mankar <[email protected]>

* Fix more swagger validation errors (#1841)

* Fix more swagger validation errors

These could be prevented by turning some lists to sets in the swagger2
package, but for now we simply go through all the schemas in the
`Swagger` structure, and apply `nub` on them.

* Various cleanups of Qualified and related types (#1839)

* Refactor tagged Qualified types

This makes the `Local` and `Remote` type constructor safer, because now
it is not possible to change the domain inside a tagged value using the
`Functor` instance.

* Rename `partitionQualified` to `indexQualified`

* Refactor partitionRemoteOrLocalIds

Also rename it to partitionQualified and swap the order of results.

* Refactor and rename `partitionRemote`

The `partitionRemote` function has been renamed to `indexRemote` for
consistency with `indexQualified`, and it now returns a list of `Remote
[a]`, which preserves the information about the domains being remote.

* Remove some uses of toRemoteUnsafe

* Remove convId from ConversationMetadata

Also change type of toRemoteUnsafe and toLocalUnsafe to just take a `Domain` and
an `a` instead of `Qualified a`.

* Remove one more use of toRemoteUnsafe

* Remove lUnqualified and lDomain

We can simply use the general versions that work for both qualified
tags.

* Remove renderQualified and corresponding test

It was completely unused.

* Use data kinds for Id tags

* Better schema instance for `Qualified` values

* Add CHANGELOG entry

* Create remote 1-1 conversations (#1825)

* Extract function to create UserList

* Add stub for remote 1-1 conversation creation

* Compute remote 1-1 conversation IDs

* ensureConnected now takes a UserList

* Make /conversations/one2one federation-aware

Converted the endpoint for creating 1-1 conversations to the new
conversation ID algorithm, and enabled the endpoint to create 1-1
conversations with federated users.

Note: the case when the conversation needs to be hosted by the remote
domain is still not implemented. We probably need a new RPC for this
case.

* Remove create from UUID Version class

The create function cannot be defined for all UUID versions.

* Introduce V5 UUIDs and use them for 1-1 conv

* Servantify internal endpoint for connect conv

* Make recipient field of connect event qualified

* Extract function to create legacy connect conv

* Add tests for the conversation ID algorithm

* write internal with stubs for data functions

* Implement a function for creating and updating a 1-1 remote conversation

- The function is Galley.API.One2One.iUpsertOne2OneConversation

* use schema-profunctor for json instances

galley-types: no lax

* galley-types rename module to Intra

* galley: remove "these" dep

galley.cabal

* fix impossible example

* remove todo

* un-nameclash: one2OneConvId -> localOne2OneConvId

* remove warning suppression

* brig: add rpc function

* change api: alwyas return a conv id

* Add tests for one2one conversation internal endpoint

* Test remote one2one conversation case

* Update golden tests after change in connect event

* Add CHANGELOG entry

* Remove incorrect comment

Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>

* Leave a note with a link to a Jira ticket about a flaky test (#1844)

* Make non-collision test for 1-1 conv ids faster (#1846)

The `anySame` function has quadratic runtime, but here we can use an
`Ord` instance, and just compare the `nubOrd` lists. This also removes a
potential flakyness caused by repeated input pairs (which should be
quite likely to happen, given the low entropy of the UUID generator).

* add comment to test for FUTUREWORK (#1848)

* Fix error in member csv creation (SAML.UserRef decoding error) (#1828)

* Add failing test case.

* Nit-pick.

* Do not git-ignore pem files (at least not all of them).

* Fix error message.

* More detail in scim error responses.

* An idea.

* Implement the idea.

* FUTUREWORK.

* Update One2One conversation when connection status changes (#1850)

* move one2oneConvId to galley-types

* implement updateOne2OneConv and simple test

* add more test cases

* Clarify 403 in test

* add changelog entry

* chore: [charts] Update webapp version (#1836)

Co-authored-by: Zebot <[email protected]>

* chore: [charts] Update team-settings version (#1835)

Co-authored-by: Zebot <[email protected]>

* update to latest SFT. (#1849)

* update to latest SFT.

* Add changelog entry for SFT

Co-authored-by: jschaul <[email protected]>

* Upgrade webapp/team-settings: changelog entries for #1835 and #1836 (#1856)

* Fix SFTD in umbrella chart (#1677)

* Fix SFTD in umbrella chart

* changelog

Co-authored-by: jschaul <[email protected]>

* Move SFTD public IP docs to the top (#1672)

It's the thing people confuse the most. Hopefully people will get it wrong less now

* [charts:sftd] Introduce flag to enable TURN discovery (#1519)

* [charts:sftd] Introduce flag to enable TURN discovery

* -f integrate review feedback

* changelog

Co-authored-by: jschaul <[email protected]>

* Check extended key usage of server certificates (#1855)

* Test that server key usage is checked for fed cert

* Reject certificates without server usage flag

* Access updates affect remote users (#1854)

* Rename NotificationTargets to BotsAndMembers

* Refactor logic to remove users after access update

 - Avoid using lenses and state; since there are only two updates, these
 can be threaded manually pretty easily.
 - Rename the `NotificationTargets` type to `BotsAndMembers`, and use
 that instead of pairs (or triples) in the access update function.

This endpoint is still not properly federation-aware, since remote
members are not removed, and local member removals are not propagated to
remotes.

Co-authored-by: Stefan Matting <[email protected]>

* Re-enable multiple victim when removing members

This is useful to batch removals occurring after an access update to a
conversation.

* Remove and notify remotes on access update

* Access update removal tests

* Remove duplication in test conversation creation

Co-authored-by: Paolo Capriotti <[email protected]>
Co-authored-by: Marko Dimjašević <[email protected]>

* Change tag (#1859)

* Check connections when adding remote users to a conv (#1842)

* Delete stale FUTUREWORK

* Brig: delete deprecated 'GET /i/users/connections-status` endpoint

* brig: Servantify POST /i/users/connection-status

* brig: Add internal endpoint to get qualified connection statuses

* Brig: Support creating accepted connections for tests

The endpoint just creates DB entries without actually contacting the remote
backend. This is very useful when galley tests need a remote connection to exist

* wire-api: roundtrip test for To/FromByteString @Relation

The instances were deleted couple of commits ago.

* Check conn between adder and remotes when adding remotes to conv

* Check connection between conversation creator and remote members

* Do connection checking in onConversationCreated in the federation API

* Make existing federation tests succeed again by sprinkling some connections

* Add a (still failing) test for on-conversation-crated

* Add more connections to pass federation API tests

* onConvCreated: Ensure creator of conv is included as other member

* More coverage for onConvCreated

* onConvUpdated: Only allow connected users to add local users

* Add test case: Only unconnected users to add

* Fix integration tests

Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: jschaul <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: Paolo Capriotti <[email protected]>

* Make conversation creator unqualified in on-conversation-created RPC (#1858)

* Unqualify rcOrigId in `on-conversation-created`

Also add some Remote and Local tags to various functions.

* Simplify partitioning in onConversationCreated

* Improve comment about creator ID in RPC

* Ensure creator in the conv domain in tests

Co-authored-by: jschaul <[email protected]>

* Parallelise RPCs (#1860)

* Add runFederatedConcurrently utility

* Paralellise remote conversation notification

* Add Local and Remote tags to profile functions

* Parallelise RPCs for fetching profiles

* Rename indexRemote to bucketRemote

This makes it consistent with indexQualified and bucketQualified.

* Move traverseWithErrors to Util module

* Parallelise claimMultiPrekeyBundles

* Close GRPC client after making a request to a remote federator (#1865)

* Add Resource effect to InternalServer stack

* Ensure GRPC clients are closed after a request

* Allow using kind cluster with imagePullPolicy=Never (#1862)

* Allow using kind cluster with imagePullPolicy=Never

drive-by fix: create namespace if it doesn't exist yet

* Update helm version in nix-shell to fit version used elsewhere

* set kind kubeconfig permissions correctly

* fixup helmfile

* Hi CI

* disable flaky test in gundeck (#1867)

* disable flaky test in gundeck

* Hi CI

* Check connections when creating group and team convs with remote members  (#1870)

* Remove unnecessary remote domain from mock federator

* Remove unnecessary check for remote users' existence in createConv

Since we check for connections, we don't need to also find out if the users
exist.

* Check remote connections when creating team conv

Just like for regular group conversations, do not fetch profiles, and
instead check both local and remote connections.

Also added failure tests for team conversation creation with unconnected
locals or remotes.

* Remove opts argument for mock federator

* Add CHANGELOG entries

Co-authored-by: Paolo Capriotti <[email protected]>

* minor Readme: document usage of helm charts (#1307)

* Support deleting conversations with federated users (#1861)

* Refactor: Use pushConversationEvent

* add onConversationDeleted RPC

* deleteTeamConversation: rpc onConversationDeleted

* Data.deleteConversation: remove remotes

* add changelog entry

* wire-api: extend ConversationAction

* onConversationDeleted -> onConversationUpdated

* fix compilation

* remove duplicated import

* cosmetic change

* fix call to withTempServantMockFederator

* Remove a leftover TODO that was addressed (#1868)

* In Conversation Endpoints Make the members.self ID Qualified (#1866)

* Make the self member's ID qualified
* Simplify conversation view functions
* Unrelated small change: remove a cycle of qualifying a conversation ID in a test
* Introduce qualifyLocal to the BotNet monad

* Changelog script: skip empty sections (#1871)

* Replace shell.nix with a direnv + nixpkgs.buildEnv based setup (#1876)

* Replace shell.nix with a direnv + nixpkgs.buildEnv based setup

* Add instructions on how to use nix-hls.sh from emacs

* Correctly update PATH in .envrc (#1877)

* Introduce 'make flake-PATTERN' (#1875)

Add a 'make flake-PATTERN' target to run a subset of tests multiple times to trigger a failure case in flaky tests. By default the test(s) will run up to 1000 times until a failure occurs, at which point it will stop. Scrolling up on the output will show you how many tests had to run to trigger a failure.

example output:

```
make flake-sso-id
echo 'set -ex' > /tmp/flake.sh
chmod +x /tmp/flake.sh
for i in $(seq 1000); do \
	echo "echo $i" >> /tmp/flake.sh; \
	echo '../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p "sso-id" ' >> /tmp/flake.sh; \
done
INTEGRATION_USE_NGINZ=1 ../integration.sh /tmp/flake.sh
Running tests using mocked AWS services
[cannon] I, Listening on 127.0.0.1:8083
[cannon] I, Listening on 127.0.0.1:8183
[cargohold] I, Listening on 0.0.0.0:8084
[spar] I, logger=cassandra.spar, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
[federator] D, inotify initialized, inotify=<inotify fd=11>
[gundeck] I, Listening on 0.0.0.0:8086
[galley] I, Listening on 127.0.0.1:8085
[spar] I, Listening on 0.0.0.0:8088
[nginz] 127.0.0.1 - - [20/Oct/2021:16:33:50 +0200] "GET /i/status HTTP/1.1" 200 0 "-" "curl/7.71.1" "-" - 2 0.000 - - - - 3cabaf643c510db36a3c989301d73569
all services are up!
++ echo 1
1
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:51Z, D, Connecting to 127.0.0.1:9042
2021-10-20T14:33:51Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:51Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: OK (0.82s)

All 1 tests passed (0.83s)
++ echo 2
2
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:53Z, D, Connecting to 127.0.0.1:9042
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:33:53Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:53Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
OK (0.85s)

All 1 tests passed (0.85s)
++ echo 3
3
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:55Z, D, Connecting to 127.0.0.1:9042
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:33:55Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:55Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
OK (0.77s)

All 1 tests passed (0.77s)
++ echo 4
4
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:33:56Z, D, Connecting to 127.0.0.1:9042
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:33:56Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:33:56Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
OK (0.79s)

All 1 tests passed (0.79s)
++ echo 5
5
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
```

When a failure happens:

```
++ echo 282
282
++ ../../dist/brig-integration -s brig.integration.yaml -i ../integration.yaml -p sso-id
2021-10-20T14:41:25Z, D, Connecting to 127.0.0.1:9042
[brig] W, logger=cassandra.brig, Server warning: Read 0 live rows and 2102 tombstone cells for query SELECT * FROM brig_test.users_pending_activation WHERE  LIMIT 10000 (see tombstone_warn_threshold)
Brig API Integration
  user
    account
      put /i/users/:uid/sso-id: 2021-10-20T14:41:25Z, I, Known hosts: [datacenter1:rack1:127.0.0.1:9042]
2021-10-20T14:41:25Z, I, New control connection: datacenter1:rack1:127.0.0.1:9042#<socket: 3>
[brig] W, logger=cassandra.brig, Server warning: Read 0 live rows and 2104 tombstone cells for query SELECT * FROM brig_test.users_pending_activation WHERE  LIMIT 10000 (see tombstone_warn_threshold)
FAIL
        Exception: Assertions failed:
         1: 202 =/= 403
         2: updatePhone (PUT /self/phone): failed to update to Phone {fromPhone = "+046965171332989"} - might be a flaky test tracked in https://wearezeta.atlassian.net/browse/BE-526

        Response was:

        Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Wed, 20 Oct 2021 14:41:27 GMT"),("Server","Warp/3.3.13"),("Content-Encoding","gzip"),("Content-Type","application/json")], responseBody = Just "{\"code\":403,\"message\":\"The given phone number has been blacklisted due to suspected abuse or a complaint.\",\"label\":\"blacklisted-phone\"}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}
        CallStack (from HasCallStack):
          error, called at src/Bilge/Assert.hs:89:5 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          <!!, called at src/Bilge/Assert.hs:107:19 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          !!!, called at test/integration/Util.hs:735:3 in main:Util
          updatePhone, called at test/integration/API/User/Account.hs:1230:11 in main:API.User.Account

1 out of 1 tests failed (0.79s)
Terminated
Terminated
[brig] W, logger=cassandra.brig, Server warning: Read 0 live rows and 2106 tombstone cells for query SELECT * FROM brig_test.users_pending_activation WHERE  LIMIT 10000 (see tombstone_warn_threshold)
make: *** [Makefile:114: flake-sso-id] Error 1

```

* updatePhone deflake (#1874)

* updatePhone deflake debugging information

This is about https://wearezeta.atlassian.net/browse/BE-526

I think what's happening is that one test that tests the phone blocking
adds a record into the brig.excluded_phones entry. Then, another,
unrelated test, if unlucky enough to randomly generate a phone number
contained under that prefix, fails in the PUT /self/phone call.

* 1) update integration test output to give better information and link
  to a flaky test description
* 2) change the code to (hopefully) avoid this flake to re-occur.

The changes to integration tests will lead to the following output on
failure:

  user
    account
      put /i/users/:uid/sso-id:
        Exception: Assertions failed:
         1: 202 =/= 403
         2: updatePhone (PUT /self/phone): failed to update to Phone {fromPhone = "+046965171332989"} - might be a flaky test tracked in https://wearezeta.atlassian.net/browse/BE-526

        Response was:

        Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Wed, 20 Oct 2021 14:41:27 GMT"),("Server","Warp/3.3.13"),("Content-Encoding","gzip"),("Content-Type","application/json")], responseBody = Just "{\"code\":403,\"message\":\"The given phone number has been blacklisted due to suspected abuse or a complaint.\",\"label\":\"blacklisted-phone\"}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}
        CallStack (from HasCallStack):
          error, called at src/Bilge/Assert.hs:89:5 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          <!!, called at src/Bilge/Assert.hs:107:19 in bilge-0.22.0-5tCtgpJGKRb38JsbN4shGd:Bilge.Assert
          !!!, called at test/integration/Util.hs:735:3 in main:Util
          updatePhone, called at test/integration/API/User/Account.hs:1230:11 in main:API.User.Account

* undo changes in src as they make another test fail

* Add a cleanup line

* fixup

* Hi CI

* Include conv creator is only once in notifications sent to remotes (#1879)

To remove any confusion in the `on-conversation-created` federation API, rename
"members" to "non_creator_members". As the creator is already specified in
"orig_user_id".

Also:
- Add Golden tests for `NewRemoteConversation`
- Add integration tests for creating conversation with remote users

* Optimise remote user deletion (#1872)

Creates two Federation RPCs:

* In brig: on-user-deleted, notify about the connections in chunks of 1000 users.
* In galley: on-user-deleted, notify about the conversations in chunks 1000 conversations

When writing integration tests in brig, we can mock the federator for brig but not galley. As the two RPCs must be made from two separate places. So, we had to mock out galley to be able to test the brig functionality. The galley functionality is tested separately by calling the internal endpoint.

Co-authored-by: Akshay Mankar <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>

* Set federator's default log level to Info (#1882)

* Rename the two federation/on-user-deleted endpoints (#1883)

* Update Federation API conventions doc in prep for on-user-deleted

* brig/galley: Rename the two federation/on-user-deleted endpoints

This is to ensure that they do not overlap. This will hopefully make it easier
to merge brig and galley.

* Extract type level vars for UserDeleteNotificationMax{Conns,Convs}

* Galley polysemy (1/5) - Introduce Sem and "access" effects (#1881)

* Add type variable to Galley monad

This is step 0 in the process of converting galley to effects. We
introduce a phantom type variable `r` in the `Galley` monad, which will
later be used for the effect row.

* Use API instead of DB access in 1-1 conv test

* Monomorphise Data functions

* Avoid MonadUnliftIO in Bilge.RPC

* Remove unneeded MonadLogger constraint

* Introduce fine-grained placeholder effects

This commit introduces several placeholder effects, mostly having to do
with making HTTP requests. All the existing uses of `MonadUnliftIO` are
now either gone, or hidden behind of of these effects, and that made it
possible to get rid of the `MonadUnliftIO` instance for `Galley`.

Also, the `Galley0` type synomym now refers to `Galley` without any
effects, so `runGalley` and related functions now take a `Galley
GalleyEffects` instead.

`Galley0` still has a `MonadUnliftIO` instance, so it can be used as a
temporary crutch to get access to async primitives. Those need to be run
in `Galley0`, and finally lifted to a general `Galley r` monad.
Eventually, the `Galley0` actions will simply be replaced by effect
actions, and the code actually using `MonadUnliftIO` will be relegated
to interpreters.

* Remove MonadMask instance of Galley

This also introduces a `SparAccess` effect and adds a few more
`BrigAccess` and `BotAccess` constraints.

* Remove MonadCatch instance of Galley

* Turn Galley into a Sem newtype

The underlying `Sem` monad in `Galley` is an arbitrary effect stack that
contains at least the effects which replicate the functionality of the
original `Galley` monad. All the functionality has been reimplemented in
terms of `Sem`, so the existing code does not need to be changed at all.

* Allow configuring nginz so it serves the deeplink for apps to discover the backend (#1889)

Allow nginz to serve a deeplink (see also https://docs.wire.com/how-to/associate/deeplink.html )

Co-authored-by: jschaul <[email protected]>

* upgrade webapp to federation-capable (not for production use!) version. (#1892)

* Release 2021_10_29

Co-authored-by: Paolo Capriotti <[email protected]>
Co-authored-by: jschaul <[email protected]>
Co-authored-by: fisx <[email protected]>
Co-authored-by: Sandy Maguire <[email protected]>
Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: Akshay Mankar <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: zebot <[email protected]>
Co-authored-by: Zebot <[email protected]>
Co-authored-by: Arian van Putten <[email protected]>
Co-authored-by: Lucendio <[email protected]>

* [feature config] self-deleting messages (#1857)

* Add self-deleting messages feature config.

* Fix: cassandra's update doesn't work as you'd think!

* Ormolu.

* make git-add-cassandra-schema

* Fix syntax error in cql query.

* Changelog.

* Re-align with changes in Data.Id.

Co-authored-by: Paolo Capriotti <[email protected]>
Co-authored-by: jschaul <[email protected]>
Co-authored-by: Sandy Maguire <[email protected]>
Co-authored-by: Marko Dimjašević <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: Akshay Mankar <[email protected]>
Co-authored-by: Stefan Matting <[email protected]>
Co-authored-by: zebot <[email protected]>
Co-authored-by: Zebot <[email protected]>
Co-authored-by: Julia Longtin <[email protected]>
Co-authored-by: Arian van Putten <[email protected]>
Co-authored-by: Lucendio <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants