Skip to content

Conversation

procr1337
Copy link
Contributor

@procr1337 procr1337 commented Apr 24, 2025

Attempted fix for #2729 based on previous discussion in #4954.

To summarize, the situation of pending room key sharing requests prior to calls to GroupSessionManager::share_room_key can indeed occur if matrix-sdk-crypto is used directly. In the case where matrix-sdk wraps it, Room::share_room_key makes sure to send pending requests right after, as well as mark them as sent which will cause the corresponding entry in to_share_with_set to be moved to shared_with_set. If there is a failure, the room key is discarded right after (to avoid UTD errors from missing keys).

This is of course racy, since this entire sequence is not atomic. After an interruption and restart, we can end up back in GroupSessionManager::share_room_key with pending requests (which after all is the point of persisting the pending requests in the first place).

It seems ugly to export shared_with_set and to_share_with_set. It feels like there should be abstractions similar to is_shared_with that provide a semantic view into GroupSessionManager's state carrying sufficient information for all existing uses of shared_with_set and to_share_with_set outside of the impl (all of which are in share_strategy.rs). If you prefer this approach let me know.

Signed-off-by: Niklas Baumstark [email protected]

niklasb added 2 commits April 24, 2025 18:15
…uests when collecting devices that have already received a session.

This avoids conditions where a key may be shared with a device only
after we decided that it is fine to reuse (and not rotate) the session
based on the wrong assumption that that particular device does not have
the keys.

Fixes matrix-org#2729
@procr1337 procr1337 requested review from a team as code owners April 24, 2025 16:52
@procr1337 procr1337 requested review from poljar and removed request for a team April 24, 2025 16:52
Copy link

codecov bot commented Apr 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.88%. Comparing base (91d085c) to head (e267f5e).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4975      +/-   ##
==========================================
- Coverage   85.88%   85.88%   -0.01%     
==========================================
  Files         325      325              
  Lines       35699    35712      +13     
==========================================
+ Hits        30661    30671      +10     
- Misses       5038     5041       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@poljar poljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this looks good and indeed should close #2729.

Comment on lines +444 to +452
if let Some(for_user) = guard.get(user_id) {
shared.extend(for_user.iter().filter_map(|(d, info)| {
if is_actually_shared(info) {
Some(AsRef::<DeviceId>::as_ref(d))
} else {
None
}
}));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a shame that we can't reuse OutboundGroupSession::is_shared_with or at least put this into a new method.

But that would introduce unnecessary allocations due to the locking.

Oh well...

@poljar
Copy link
Contributor

poljar commented Apr 25, 2025

It seems ugly to export shared_with_set and to_share_with_set. It feels like there should be abstractions similar to is_shared_with that provide a semantic view into GroupSessionManager's state carrying sufficient information for all existing uses of shared_with_set and to_share_with_set outside of the impl (all of which are in share_strategy.rs). If you prefer this approach let me know.

I was thinking that it would be nice to have something like OutboundGroupSession::devices_shared_with(&self, user_id: &UserId) -> BTreeSet<&DeviceId>.

But that is tricky to add due to the locks protecting the sets. We would need to combine the return value with the lock guards, or we would need to clone the device IDs.

So I think the current approach is fine.

@poljar poljar merged commit 6e119c7 into matrix-org:main Apr 25, 2025
42 checks passed
@procr1337
Copy link
Contributor Author

@poljar In my limited understanding of rust patterns I guess the typical approach would be to return an impl Iterator<&DeviceId> which holds the read locks - thanks for your feedback, it has been a pleasant experience contributing to this code base so far.

@procr1337 procr1337 deleted the fix-2729 branch April 25, 2025 14:21
@poljar
Copy link
Contributor

poljar commented Apr 25, 2025

@poljar In my limited understanding of rust patterns I guess the typical approach would be to return an impl Iterator<&DeviceId> which holds the read locks.

Yeah that would likely work but as you have to take ownership of the guard and borrow a value behind the guard this gets quite tricky.

I'm not sure if the borrow checker is smart enough to know that you should be able to do this, i.e. not sure if you can get away without using unsafe.

I would say it's not really worth it to spend time on this, but if you manage to get it work without unsafe I wouldn't object to merging it.

Thanks for your feedback, it has been a pleasant experience contributing to this code base so far.

Thank you for the patches and for the patience while waiting for the review.

richvdh added a commit to matrix-org/matrix-js-sdk that referenced this pull request Jun 21, 2025
For js-sdk users, this includes the following:

    -   Send stable identifier `sender_device_keys` for MSC4147 (Including device keys with Olm-encrypted events).
        ([#4964](matrix-org/matrix-rust-sdk#4964))

    -   Check the `sender_device_keys` field on _all_ incoming Olm-encrypted to-device messages and ignore any to-device messages which include the field but whose data is invalid (as per [MSC4147](matrix-org/matrix-spec-proposals#4147)).
        ([#4922](matrix-org/matrix-rust-sdk#4922))

    -   Fix bug which caused room keys to be unnecessarily rotated on every send in the presence of blacklisted/withheld devices in the room.
        ([#4954](matrix-org/matrix-rust-sdk#4954))

    -   Fix [matrix-rust-sdk#2729](matrix-org/matrix-rust-sdk#2729) which in rare cases can cause room key oversharing.
        ([#4975](matrix-org/matrix-rust-sdk#4975))
github-merge-queue bot pushed a commit to matrix-org/matrix-js-sdk that referenced this pull request Jun 23, 2025
For js-sdk users, this includes the following:

    -   Send stable identifier `sender_device_keys` for MSC4147 (Including device keys with Olm-encrypted events).
        ([#4964](matrix-org/matrix-rust-sdk#4964))

    -   Check the `sender_device_keys` field on _all_ incoming Olm-encrypted to-device messages and ignore any to-device messages which include the field but whose data is invalid (as per [MSC4147](matrix-org/matrix-spec-proposals#4147)).
        ([#4922](matrix-org/matrix-rust-sdk#4922))

    -   Fix bug which caused room keys to be unnecessarily rotated on every send in the presence of blacklisted/withheld devices in the room.
        ([#4954](matrix-org/matrix-rust-sdk#4954))

    -   Fix [matrix-rust-sdk#2729](matrix-org/matrix-rust-sdk#2729) which in rare cases can cause room key oversharing.
        ([#4975](matrix-org/matrix-rust-sdk#4975))
github-merge-queue bot pushed a commit to matrix-org/matrix-js-sdk that referenced this pull request Jun 23, 2025
For js-sdk users, this includes the following:

    -   Send stable identifier `sender_device_keys` for MSC4147 (Including device keys with Olm-encrypted events).
        ([#4964](matrix-org/matrix-rust-sdk#4964))

    -   Check the `sender_device_keys` field on _all_ incoming Olm-encrypted to-device messages and ignore any to-device messages which include the field but whose data is invalid (as per [MSC4147](matrix-org/matrix-spec-proposals#4147)).
        ([#4922](matrix-org/matrix-rust-sdk#4922))

    -   Fix bug which caused room keys to be unnecessarily rotated on every send in the presence of blacklisted/withheld devices in the room.
        ([#4954](matrix-org/matrix-rust-sdk#4954))

    -   Fix [matrix-rust-sdk#2729](matrix-org/matrix-rust-sdk#2729) which in rare cases can cause room key oversharing.
        ([#4975](matrix-org/matrix-rust-sdk#4975))
github-merge-queue bot pushed a commit to matrix-org/matrix-js-sdk that referenced this pull request Jun 23, 2025
For js-sdk users, this includes the following:

    -   Send stable identifier `sender_device_keys` for MSC4147 (Including device keys with Olm-encrypted events).
        ([#4964](matrix-org/matrix-rust-sdk#4964))

    -   Check the `sender_device_keys` field on _all_ incoming Olm-encrypted to-device messages and ignore any to-device messages which include the field but whose data is invalid (as per [MSC4147](matrix-org/matrix-spec-proposals#4147)).
        ([#4922](matrix-org/matrix-rust-sdk#4922))

    -   Fix bug which caused room keys to be unnecessarily rotated on every send in the presence of blacklisted/withheld devices in the room.
        ([#4954](matrix-org/matrix-rust-sdk#4954))

    -   Fix [matrix-rust-sdk#2729](matrix-org/matrix-rust-sdk#2729) which in rare cases can cause room key oversharing.
        ([#4975](matrix-org/matrix-rust-sdk#4975))
toger5 pushed a commit to matrix-org/matrix-js-sdk that referenced this pull request Jul 18, 2025
For js-sdk users, this includes the following:

    -   Send stable identifier `sender_device_keys` for MSC4147 (Including device keys with Olm-encrypted events).
        ([#4964](matrix-org/matrix-rust-sdk#4964))

    -   Check the `sender_device_keys` field on _all_ incoming Olm-encrypted to-device messages and ignore any to-device messages which include the field but whose data is invalid (as per [MSC4147](matrix-org/matrix-spec-proposals#4147)).
        ([#4922](matrix-org/matrix-rust-sdk#4922))

    -   Fix bug which caused room keys to be unnecessarily rotated on every send in the presence of blacklisted/withheld devices in the room.
        ([#4954](matrix-org/matrix-rust-sdk#4954))

    -   Fix [matrix-rust-sdk#2729](matrix-org/matrix-rust-sdk#2729) which in rare cases can cause room key oversharing.
        ([#4975](matrix-org/matrix-rust-sdk#4975))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants