Skip to content

Conversation

turt2live
Copy link
Member

@turt2live turt2live added e2e client-server Client-Server API kind:core MSC which is critical to the protocol's success labels Sep 27, 2021
@turt2live turt2live changed the title MSC3414: Encrypted state events [WIP] MSC3414: Encrypted state events Sep 27, 2021
@turt2live turt2live marked this pull request as draft September 27, 2021 02:41
@turt2live turt2live added the proposal A matrix spec change proposal label Sep 27, 2021
@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Sep 27, 2021

## Proposal

Under this proposal, all room state events can be encrypted with the exception of events critical to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically speaking, m.room.member is one of the most important things to encrypt - I wonder if this MSC should propose a way to encrypt the redactable bits of the payloads via megolm too. (It could be as easy as keeping the event type the same, but having an m.encrypted mix-in for the encrypted fields?)

(Also, worth noting that Synapse currently depends on m.room.name and m.room.topic for searching the roomdir)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(actually, we could split this off into its own MSC, i guess, but having a generic mechanism of encrypting the redactable fields of a state event without breaking its event type feel potentially useful)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd probably be best to split this off to its own MSC, but it's also somewhat worth considering whether we'll have profiles-as-rooms (and how that works for per-room profiles) in a reasonable timeframe. If yes, then we might not need to encrypt half of an event, but if no then it's definitely worth pursuing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only for m.room.member it makes sense to encrypt the redactable bits, but not the whole event (since the server needs to know membership, statekey and event type). For the other state events we can actually do all or nothing. So we could also craft an alternative just for member events, where we add an m.room.member_profile and put all the redactable bits there. Then the m.room.member would just have the membership, state_key and type fields and we would potentially also ease the load on state res, because a name change wouldn't potentially affect what events a user can send. profiles-as-rooms could solve that too, but are a much bigger task with its own downsides, if you want to encrypt as much metadata as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting profiles away from membership events is a bit non-trivial for this MSC, but possibly a future one.

All other state event types are able to be encrypted, if the room itself is encrypted.

An encrypted state event looks very similar to a regular encrypted room message: the `type` becomes
`m.room.encrypted` and the `content` is the same shape as a regular `m.room.encrypted` event. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little scared about clobbering the state event's event type. For one thing, it means that PLs can't be enforced for that event type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they can't be enforced, but the (undocumented) thought on this is that we can have clients enforce the restrictions locally, as they already kinda need to for every other event type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very challenging to implement, since clients usually don't have the full history cached and as such might not know, when a power level changed. This might lead to clients discarding old state events as invalid, because they think the sender was not allowed to send it. In the end I do appreciate though, that the state key and event type are not visible in plain text anymore. While that would make the implementation much easier, in a lot of cases that is the interesting data of a state event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients have access to /context though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is just a bit complicated and there will be bugs, imo. I still prefer this approach in general though. (Also I didn't know that /context returned state events)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly going to need a compelling argument to not hide the type/state key, given we do the same approach for timelines.

Copy link
Member

@kegsay kegsay Sep 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to do the same trick as https://github.com/matrix-org/matrix-spec-proposals/blob/kegan/persist-edu/proposals/4354-sticky-events.md if you want to actually converge when doing client side conflict resolution with PLs.

Basically, it's not safe to drop any events and you have to persist them all, regardless of whether they are authorised or not.

Comment on lines +51 to +52
clients should then respond with the appropriate decryption keys regardless of device/user trust. Note
that this effectively means a client should use a dedicated megolm session when working with room state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really is the most important part imo in this MSC and needs to be more explicit and a bit more nuanced. You will probably need a different megolm session for each event_type+state_key tuple. Otherwise you risk leaking old state to new members.

For example if we use the same session for the room name and the topic and we set it like this:

event_type megolm_index
m.room.name 0
m.room.topic 1
m.room.topic 2

Then we leak the old topic to a new joiner, if we share the room name with them. That also means we will have quite a few megolm sessions per room, if we try to prevent that by having one session per newest state event. On the other hand the impact would be limited by the rotation of megolm keys after a week, if that still applies to state events.

To me this is the critical part of this MSC and deserves additional attention, so that clients can implement it correctly and securely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of leak are you thinking of? This would be no different than state events today when someone joins a room, and history visibility would more often be used to protect old state from prying eyes.

Copy link
Contributor

@deepbluev7 deepbluev7 Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you don't want to leak old messages, you protect that by only sharing indices from where the user was invited/joined, I'd say the same should apply to state events. A user should only be able to decrypt the state from when they joined/got invited. You don't rely on history visibility in encrypted rooms, because that would mean trusting the server. But currently it is impossible to share the decryption key for index 0 and 2 without also sharing 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that nuance is supposed to be implied as part of "use matrix crypto" - I'm very keen to not duplicate the entire crypto spec into this MSC. This MSC only goes as far as describing how the existing systems don't work for this setup, but intentionally does not say how the existing rules already work. This applies to all sorts of areas like sharing at particular indices and how keys are sent to people.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying to duplicate the existing spec, but mentioning the danger of using the same session for all different state event types would be appreciated, because it might not be obvious. Especially since not calling out such issues explicitly was one of the reasons for the recent key sharing security issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hot take: leaking old state is fine.

I say this because if and when we ever cut the server out entirely from state resolution then all the clients need to know the state history in order to correctly perform state resolution. Keyhive have a cute solution to this by putting the decryption key for the causal predecessors into the event itself, so knowledge of a single decrypted event allows you to decrypt all the causal predecessors. The only way we could realistically avoid this is if the DAG had some notion of finality, as then you could use the finality boundary as an epoch to delineate where the keys are valid up to.

ID in the state key and using `m.room.encrypted` for the `type` are subject to similar issues.

## Security considerations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does a user know, if it should share keys with a member? A server could always insert a new member into a room (or even show the new member only to a specific room member). That way it could effectively attack the room to get access to the room state. Messages are protected against that, by tracking who was in the room, when an encrypted message is sent. The user then has the chance to not send the message to the members in the room and keys are only reshared to devices that should have received that message.

This does not apply to room state. You need to share the keys for room state to new members in the room. You can somewhat limit the impact by only sharing the keys with trusted members, but most clients don't do that by default, because it leads to a lot of undecryptable messages.

So the question is:

  • Should the user be prompted before sharing the room state? (This might make joining a room annoying=
  • Should it automatically treat members you invited yourself or you already verified as trusted and automatically share room state? (This still has a risk of a trusted user joining a room, that it shouldn't and getting access to state they shouldn't have.)

is that something clients have to figure out for themselves how secure they want to be or should we have some guidelines, so that we can rely on any client having some standard of security?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MSC outlines that all members who join get the whole room state, regardless of user/device trust.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was about the server having easy access to the encrypted state by just temporarily adding a device or user into the room and then removing it. This doesn't even have to be in the room state, but could be just sent to a local user via /sync. From what I can tell, this effectively makes the encryption useless, because it can't protect you from the server, just from other users. I don't see where the MSC addresses that concern, but maybe I'm just blind and you could link me to the relevant lines? Because I don't see, how this protects you from the server just requesting the keys for the room state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see now. That is a bit of a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be solved by #3917, which IMO this MSC should wait on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide a short update, the concern with malicious devices should be alleviated by MSC4153. Malicious room members remain a problem.

* `m.room.third_party_invite`
* `m.room.history_visibility`
* `m.room.guest_access`
* `m.room.encryption` - Though not critical to the server-side protocol, it is critical to clients.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list seems to be missing the acl event

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And tombstone events.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tombstones might be more questionable tbh. They're not exactly "critical" to the client any more than the name is, and the server just doesn't care.

Comment on lines +50 to +51
keyshare requests to anyone with larger-than-default power level in the room, where those empowered
clients should then respond with the appropriate decryption keys regardless of device/user trust. Note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that we should spec out this behaviour. "Larger-than-default" could potentially be a lot of people and it would be best to even scale up those. Maybe send a request to 2, then if they don't respond quickly start ratcheting it up. But either way the exact algorithm is not important, so I think it is best to leave it unspecified so that clients can experiment and find out what works best. For example maybe it is better to contact users who recently sent a message, or it is a good idea to add some randomness to the decision so that not all clients are asking the same admin.

I would replace this with something like "When a client joins a room it should send keyshare requests to existing room members. To avoid unnecessary work it should limit the rate at which it sends requests. It is recommended that clients send request to users of higher power level first."

And maybe you can provide "larger-than-default" as a non-normative example.

Comment on lines +70 to +72
To encrypt an `m.room.name` state event, the client would first determine if there's already an encrypted
room name event. If so, it would use the same `state_key`, but otherwise generate a string to become
the `state_key`. Then, the event is encrypted as normal with the following payload:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To encrypt an `m.room.name` state event, the client would first determine if there's already an encrypted
room name event. If so, it would use the same `state_key`, but otherwise generate a string to become
the `state_key`. Then, the event is encrypted as normal with the following payload:
To encrypt an `m.room.name` state event, the client would first determine if there's already an encrypted
room name event. If so, it would reuse the outer `state_key` as the outer state key of the new event, otherwise the client would generate a random string to become
the outer `state_key`. Then, the event is encrypted as normal with the following payload:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this section very confusing as the inner and outer state keys are not given different names to help differentiate them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"inner" and "outer" are honestly more confusing imo. They imply a parent/child relationship when there isn't one.

Will have a think about how to improve it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "server-visible" and "encrypted"? I agree that inner and outer aren't the best. I don't think they imply parent-child but aim to say that one state-key is inside the encrypted payload and one is outside.

of both rooms and Bob got invited to Room A by Alice. Alice shares keys for all relevant parts of Room A (the
state and timeline under MSC3061) and Room B's state with Bob. If Alice were to add Room C as a room for members
to join Room B with, then Alice would send decryption keys for all of Room B's state with all members of Room
C too.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would be very noisy. If you imagine a space for a company or similar every topic change of a space-joinable room is going to be sent to all employees. This could be a lot of traffic if you have a couple of rooms have frequently changing topics such as critical issues, current oncall or anything else.

As a worked example, Room A and Room B exist, where Room B is restricted to members of Room A. Alice is admin
of both rooms and Bob got invited to Room A by Alice. Alice shares keys for all relevant parts of Room A (the
state and timeline under MSC3061) and Room B's state with Bob. If Alice were to add Room C as a room for members
to join Room B with, then Alice would send decryption keys for all of Room B's state with all members of Room
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear why Alice is doing all of the sharing. Is it because she is an admin of room A or because she is an admin of room B? Or are we assuming that she is making these state event changes. I think this needs to be explicit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps clarifying what privileges's Bob assumed to have (regular user, so non-admin rights, that's read/write to public rooms, but no invites..., for example?) in this scenario'd also be helpful.


An encrypted state event looks very similar to a regular encrypted room message: the `type` becomes
`m.room.encrypted` and the `content` is the same shape as a regular `m.room.encrypted` event. The
`state_key` for encrypted state events is an arbitrary string used to differentiate different kinds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means that we lose the ability for the server to mandate that a state_key containing a mxid can only be written by that mxid. We'd need a replacement (e.g. a flag on the unencrypted event payload to say "only let the author edit this event")

[MSC3061](https://github.com/matrix-org/matrix-doc/pull/3061) can be leveraged for explicit invites to new
members of the room, the MSC also does not solve the problem that restricted rooms impose.

To fix the reliance on key sharing, it is proposed that keys be automatically sent whenever restricted-ness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting idea that someone (i forget who, sorry) proposed is that one could maintain serverside key backups per space (or room). So when you invite a new person into a big space tree of encrypted rooms, you literally give them the key to access the serverside backup for as long as they are a member of the space. Obviously this breaks pfs, but pfs is already gone the second you are sharing keys on invite anyway.

@ghost ghost mentioned this pull request May 15, 2022
Mikaela added a commit to jae1911/matrix-spec-proposals that referenced this pull request Aug 15, 2022
…light dependency

Co-authored-by: Jae Lo Presti <[email protected]>
Signed-off-by: Aminda Suomalainen <[email protected]>
Most events in room state could be encrypted to preserve metadata, which is what this MSC aims
to achieve. Some parts cannot be encrypted though in order to maintain a working protocol.

## Proposal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MSC is fundamentally incompatible with Sliding Sync (MSC #3575), which relies on the server being able to calculate room names serverside in order to paginate them. Moreover, if it lands, it will fragment the network into SS-capable traffic versus non-SS-capable traffic.

The only mitigation I can see is if there's a homomorphic way we can map the plaintext room & room membership names to ordered items which can be used to order the rooms. (Or we could give up on encrypting m.room.name and profile data, but that /really/ starts to undermine the intentions of this MSC).

Copy link
Contributor

@poljar poljar Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only mitigation I can see is if there's a homomorphic way we can map the plaintext room & room membership names to ordered items which can be used to order the rooms.

I'm not sure if homomorphic encryption would help here. To remind ourselves what homomorphic encryption is about:

Homomorphic encryption is a form of encryption that permits users to perform computations on its encrypted data without first decrypting it. These resulting computations are left in an encrypted form which, when decrypted, result in an identical output to that produced had the operations been performed on the unencrypted data.

So let's make a thought experiment out of this, suppose we have a homomorphic encryption scheme which allows us to order a variable sized array of strings in a alphabetical order. Let's call this function $S(x)$.

Let's say that the array of strings are the display names of all the rooms the user is in, $D$ from now on. This data lives on the server and clients initially don't know the size of the array nor the contents. Each element in $D$ is encrypted and the homeserver can sort $D$ via $S(x)$.

So the client tells the server I want you to sort $D$ and give me a sub-array of those. Skip the first $n$ and give me the remaining $m$ elements, a window into the whole array $D$ if you will.

The server now goes around and does it's thing and computes $R$ via:
$$R = S\left(D\right)$$

Now by definition, each element in $R$ is encrypted and the server can't correlate the elements in $R$ to the ones in $D$.

The client receives $R$ and can decrypt it. It now contains the sub-array of display names for the rooms user is currently interested in. We can suppose that we included the room IDs in $D$ as well for this to be useful.

Finally the client can tell the server, I want to start syncing the rooms with the given room ids. Let us ignore the fact that this second request will leak the alphabetical order of the rooms to the server, and that it might figure out the room names over time while the user is sliding the window while scrolling through the room list.

Let us focus on the fact that this still isn't really what sliding sync wants. Sliding sync wants to know the order of the rooms. It wants to partially decrypt $R$, in other words we need to leak the order to the server. If we don't leak the order to the server, we need to make a second request like it was shown above.

Now if the above is not what you had in mind, and instead you always intended to leak the order to the server, then we arrive at order-preserving encryption[1] (OPE). But as already hinted above, leaking the order leaks more than just the order[2][3]. How much additional data we would be leaking is not clear. Room display names are a long way off from being a perfectly uniform plaintext. This makes OPE schemes a questionable choice.

I would like to be wrong and someone to come along and tell me that what I wrote up above is not how things should work, instead they would work in such and such a way where everything fits, and everything is nicely encrypted. As it stands I don't see how we would make the ordering of rooms by name, that sliding sync wants, and encrypted display names compatible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i was being lazy in referencing homomorphic encryption - in practice OPE is what'd be needed. Agreed that this would be very leaky though, to the extent that it might not be worthwhile at all.

Another approach we could take to protect state event metadata would be to go full P2P, and have the servers just be store-and-forward nodes (of encrypted blobs). Then the servers running in the clients can happily maintain the metadata in plaintext, and index it and speak SS to the client to rapidly paginate through it. (Although in that model, why would we even need E2EE...?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and have the servers just be store-and-forward nodes (of encrypted blobs).

(Although in that model, why would we even need E2EE...?)

Perhaps I'm misreading the architecture you're proposing here, but if there's no E2EE, what are the encrypted blobs encrypted with?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the ordering of sliding sync is already broken by the current encryption for rooms. The notifications counts for encrypted rooms are usually incorrect and get recalculated by clients. So there is precedent for having clients apply fixups to server side data. Now of course if you want alphabetical sorting, that will be a bit more annoying than just notification counts, since that will affect more rooms. However the roomname is (next to the topic) usually the event with the most sensitive information.

If this MSC is changed to be able to detect room name events without being able to decrypt those events, those could just get sent along the sync. This would not increase the database load, since the server already needs to load those events usually to sort the rooms (or should at least not make as much of a difference). It would increase bandwidth usage for clients that sort by displayname and make the sync a bit slower. However that might be a trade-off, that is worth the added privacy and security.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sliding sync aside, there's several reasons why the server would want to know the room name or at least where the room sits in an ordered set. Examples include email notifications (would need to be unencrypted or completely made of magic), abuse mitigations (can be encrypted), and directories (spaces, non-public rooms, etc - can be encrypted).

Preferably, we keep sliding sync design decisions on the sliding sync MSCs, as this conversation is now heading into that territory.

Copy link
Contributor

@poljar poljar Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sliding sync aside, there's several reasons why the server would want to know the room name or at least where the room sits in an ordered set. Examples include email notifications (would need to be unencrypted or completely made of magic), abuse mitigations (can be encrypted), and directories (spaces, non-public rooms, etc - can be encrypted).

I'm not sure what you're trying to say here, does this mean that we don't want to encrypt m.room.name, if so that should be listed in the MSC besides the other events that will stay unencrypted.

On the other hand if you mean that it's OK to encrypt m.room.name then I think it's OK to discuss how this will impact sliding sync and other server functionality. We should perhaps expand the small section of broken functionality to get a comprehensive list. As of now we only have this single comment:

This effectively breaks the room directory and #2946 as they use named fields for details such as the room name.

Furthermore, I don't think it was obvious to people, it certainly wasn't obvious to me, that leaking the alphabetical order of the m.room.name events will likely leak, albeit only partially, the contents as well.

Preferably, we keep sliding sync design decisions on the sliding sync MSCs, as this conversation is now heading into that territory.

Well sure, to some extent. If I go to the sliding sync MSC, then people will tell me that the sliding sync MSC isn't concerned with encrypted state, we'll figure that out later in the encrypted state MSC.

The problem is that sliding sync and encrypted state have a deeply intertwined relationship. Fetching events via the /sync endpoint lies at the heart of the protocol, if you're going to potentially break it by encrypting some event then this should be discussed and understood.

So things that I would like to know:

  1. Do we want to encrypt m.room.name
  2. Do we want to leak the order of the m.room.name events to the server
  3. If we want to encrypt m.room.name, which services will be impacted by this
  4. If we don't want to leak the order, how does this affect sliding sync. Is the alphabetical order even crucial to the performance of sliding sync.

Out of those 4 questions, the only one that would be sensible to shove into the sliding sync MSC would be the last one.

Conflicts are extremely easy to manufacture, however the feature making it easy to form conflicts is needed
in order to protect the event's metadata. The conflict resolution algorithm can additionally be easily
abused (ie: sending a timestamp for the year 4025 to force the room name to never change), however at that
point the room is making it harder on itself to change over time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows all users who can send state events to bypass power level restrictions, which isn't just "making it harder on itself". The server cannot enforce PL rules effectively with this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a unique problem for this MSC, and state events are typically only sent by those who have higher power in the room: they would be spamming themselves.

* `m.room.third_party_invite`
* `m.room.history_visibility`
* `m.room.guest_access`
* `m.room.encryption` - Though not critical to the server-side protocol, it is critical to clients.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And tombstone events.

Other approaches such as deprecating `name`, `topic`, etc in favour of a `state` array are best considered
for other MSCs.

## Alternatives
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not discussed here is the option of moving the entire room model client-side, where servers just move JSON blobs around and clients do event auth, state res, etc locally in an encrypted setting. This is obviously more involved/complicated to pull off, though.

For media events like room avatars, encryption is the same as regular room messages. The media is uploaded
encrypted to the homeserver then detailed in a payload that is also encrypted to the final room state event.

## Potential issues
Copy link
Contributor

@Kladki Kladki Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially sensitive fields of exempt state events (i.e. reason of m.room.member) are not encrypted, despite them not being required for event authorization/state resolution. If the proposal does not wish to find a solution for this, then it should recommend that clients somehow make clear that membership reasons are not encrypted.

@kaylendog
Copy link

I have just discussed possible directions with @andybalaam. We think there are three possible ways to approach the handling of state keys:

  • Utilise string packing;
  • Introduce an adjacent type field;
  • HMAC obfuscation (or similar).

String Packing

This is the simplest to implement and requires no special handling by the server, only a simple client-side check to ensure the decrypted state key matches the one contained in the packed state key on the encrypted event. It is messy and a bit of a hack. We need to settle on a suitable delimiter, and we may encounter issues with the 255-byte length restriction on state keys. Other than that, it's a totally reasonable solution, at least compared to the others.

Adjacent Type Field

This approach would introduce a field holding the true event type to the event body, either as a content.type field, or a top-level true_type field. This field would need to be considered by the server during state resolution. It's more invasive than string packing, but a less hacky approach.

HMAC Obfuscation

This is the ideal solution, as it hides the state key and type from the server entirely; however, there are some considerable downsides. We have two choices:

  • Use a static key generated on room creation to encrypt all state events for the duration of the room's existence;
  • Rotate the key periodically, perhaps based on the current session key.

In either case, we'd need to introduce a method to share the key on invite/join to allow users to decrypt room state, which would need to be entirely separate from but perhaps similar to history sharing. Furthermore, if fetching this key fails, the room becomes unusable, as the state events will be encrypted, and there is no obvious way to recover.

The latter case has even more issues: rotating the key breaks the server's ability to track room state, since two events with identical state keys will produce encrypted events with different hashed state keys when using different (HMAC) keys. The server will treat each as unique and send both to clients. This would require clients to perform state resolution locally (to decide which of two clashing events to accept), which in turn would require them to consume and understand the room DAG.

I suspect this will be true for any rotated or non-deterministic hash function that can hide the true type and state key. Using a deterministic function is ineffective because a malicious server can easily guess possible input values and gain access to the secret information.

The server being able to read the type and state_key of state events is undesirable, but it isn’t necessarily the end of the world: we could just recommend that sensitive information should not be placed in the type and state key, and live with the metadata leakage, which would be much less than it is today.

@dkasak
Copy link
Member

dkasak commented Sep 19, 2025

Using a deterministic function is ineffective because a malicious server can easily guess possible input values and gain access to the secret information.

I'm curious what mechanisms you have in mind here. Frequency analysis? Potential oracle attacks if the server can force a client to send a state event with a known type and state key?

@kaylendog
Copy link

Using a deterministic function is ineffective because a malicious server can easily guess possible input values and gain access to the secret information.

I'm curious what mechanisms you have in mind here. Frequency analysis? Potential oracle attacks if the server can force a client to send a state event with a known type and state key?

I'm primarily thinking about precomputation, although I think frequency analysis would definitely work pretty effectively.

Most state keys are either empty or contain a fully qualified user ID, the latter of which is arguably more often than not the same as sender_id, e.g. MSC3672, which is visible to the server anyway in the sender field. This yields a tiny search space, being #types * #users or so in size, which remains tractible to brute-force even for large rooms.

We could salt the input, but the salt needs to:

  • Be shared by all members of the room;
  • Invisible to the server;
  • Preserve the uniqueness of the server's (type, state_key) -> event mapping.

I fear these requirements may be inherently incompatible to some degree, and any solution would force us to consider the key-distribution/recovery/resolution problems I mentioned to arise with HMAC.

@dkasak
Copy link
Member

dkasak commented Sep 20, 2025

How would precomputation work though? HMAC is a keyed construct and in our context the key is per room and the server doesn't have it.

@kaylendog
Copy link

kaylendog commented Sep 22, 2025

How would precomputation work though? HMAC is a keyed construct and in our context the key is per room and the server doesn't have it.

I agree! But if we use single-key HMAC, then we would encounter the issues I outlined in my earlier comment, where:

  • We wouldn't have forward or backward secrecy, i.e. a single exposure of the key enables precomputation, defeating the purpose of the obfuscation.
  • We'd need to implement keysharing separately from the existing history sharing work.
  • Failing to receive the key renders the room unusable to invitees.

@dkasak
Copy link
Member

dkasak commented Sep 22, 2025

Okay, I think we're landing on the same page now.

Confidentiality without FS and PCS is still quite a strong property; so even though it would be better if we could magically have them, the lack of them is IMO not a good argument for keeping the information cleartext.

I agree it makes receiving the key paramount, but this is something that we just need to be able to pull off, since features in the future (e.g. cryptographic room membership) will require us to maintain per-room shared state without which rooms will be broken. So this is a "git gud" type of situation.

The strongest argument against it would be to show that due to frequency analysis and oracles, the feature doesn't actually offer substantive confidentiality after all.

@kaylendog
Copy link

The strongest argument against it would be to show that due to frequency analysis and oracles, the feature doesn't actually offer substantive confidentiality after all.

I've put together some rudimentary Python code to attempt to emulate a potential frequency analysis attack. It appears possible to do better than random guessing, but achieving even 20% accuracy requires at least 100 state events available for analysis. More info soon!

@kaylendog
Copy link

kaylendog commented Sep 22, 2025

Okay, I've got some results I'm reasonably happy with. The setup is this:

  • The server can see state events in a public room, including their sender, state key and type.
  • The server can see state events in an encrypted room, including the sender and hashed state key, i.e. HMAC(type || state_key).

If you assume that the distribution of state events in the two rooms is similar, you can do the following:

  1. Compute the distribution of state event types and state key pairs for the public room.
  2. Compute the distribution of hashed state keys for the private room.
  3. Compare alignments of the resulting distributions, e.g. MSE or a greedy approach (see below).
  4. Convert this alignment into a mapping between hashed state keys and true type and state key pairs.

I mocked up a small experiment that operates under the following conditions: I generate a random distribution for the public room (weights are selected from a uniform distribution, then generate some artificial datasets of n state events. I model the similarity of the private room to the public using a bias parameter k that linearly interpolates between uniform and public weights. I then apply the method above to the datasets and compute the accuracy of the resulting mapping against the ground truth.

mean_accuracy_vs_k

For various dataset sizes n and biases k, it seems clear that we can derive an accurate mapping with some degree of confidence. I take a greedy approach and construct the mapping by comparing the two sorted distributions. A better approach could, for example, utilise the Hungarian Algorithm and MSE, while still running in polynomial time. There's also the question of whether my linear interpolation of weights makes any sense.

An obvious question would be whether the similarity assumption holds in practice. Is there a typical distribution of state events in a given room, and would encrypted rooms exhibit a similar distribution?

@anoadragon453
Copy link
Member

@dkasak @kaylendog could you move the above discussion into a PR comment thread? Typically top-level comments are reserved for spec process information :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client-server Client-Server API e2e kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal

Projects

Status: Scheduled - v1.8

Development

Successfully merging this pull request may close these issues.