Limit outgoing to_device EDU size to 65536 #18416

MatMaul · 2025-05-09T14:15:27Z

If a set of messages exceeds this limit, the messages are splitted across several EDUs.

Should fix #17035.

There is currently no official specced limit for EDUs, but the consensus seems to be that it would be useful to have one to avoid this bug by bounding the transaction size.

As a side effect it also limits the size of a single to-device message to a bit less than 65536.

This should probably be added to the spec similarly to the message size limit.

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file.
Code style is correct

synapse/api/constants.py

synapse/handlers/devicemessage.py

MadLittleMods · 2025-05-20T15:43:29Z

synapse/handlers/devicemessage.py

+            edu_contents = get_device_message_edu_contents(
+                sender_user_id, message_type, messages, context
+            )
+            remote_edu_contents[destination] = edu_contents


Instead of changing the structure of remote_edu_contents (was a map from destination to EDU meta) (to a map from destination to multiple EDU meta), could we just call add_messages_to_device_inbox(...) multiple times?

The multi version should have some gain performance side, since it's in an unique transaction, and pre-allocate all the stream ids.

I am fine if we decide to keep it simple and sacrifice some perf for that, but I am not sure it's worth it here it's not overly complicated.

@MadLittleMods are you fine with keeping it like that ?

I haven't really investigated the intricacies here but my leaning would be on the simple solution I suggested.

Performance wasn't the thing we're trying to address.

synapse/handlers/devicemessage.py

tests/rest/client/test_sendtodevice.py

erikjohnston · 2025-05-23T09:06:59Z

synapse/handlers/devicemessage.py

+
+    for recipient, message in messages.items():
+        # We remove 2 for the curly braces and add 1 for the colon
+        message_entry_size = len(encode_canonical_json({recipient: message})) - 2 + 1


Drive-by thought: instead of trying to work out the lengths and calculate the number of messages we can add, it might be easier to just generate the EDU and then check the size of it. If its too big you half the number of messages and try again.

The common case will be that we don't need to split up the EDU, at the expense of duplicating some work. It feels a bit hacky, but I think might be a little less brittle?

I am not sure it will be that much simpler for comprehension TBH.

If we think we can eat the perf cost, the simpler is probably to call encode_canonical_json on the whole EDU for each added message, and remove it and create a new EDU if it's larger than the max.

My calculation tricks were to avoid doing a full serialization on each added message.

And add a special case tried first where we try to put everything in one ?

I don't know TBH, the idea of splitting in 2 is nice too but I feel like it is going to be quite annoying to implement and hence not simpler.

@erikjohnston @MadLittleMods thoughts on where we would like to go here ?

The manual message serialization is a bit error prone and confusing to follow.

@erikjohnston's suggestion sounds good to me if you're willing to adapt

From my review of #18416

Co-authored-by: Eric Eastwood <[email protected]>

mcalinghee · 2025-08-20T21:17:08Z

Looks like the CI tests are flaky as mentioned here : #18537

MadLittleMods · 2025-08-20T22:34:43Z

tests/rest/client/test_sendtodevice.py

+        # FIXME: Because huge log line is triggered in this test,
+        # trial breaks, sometimes (flakily) failing the test run.
+        # ref: https://github.com/twisted/twisted/issues/12482
+        # To remove this, we would need to fix the above issue and
+        # update, including in olddeps (so several years' wait).


Looks like there is still another case somewhere -> https://github.com/element-hq/synapse/actions/runs/17109765260/job/48528124654?pr=18416 (twisted.protocols.amp.TooLong)

I tried this to track the case but I was not able to reproduce the issue on my side.
If you have an idea on how to track the issue, I would be interested.

In the meantime, I had a guess on what was the issue by looking at the code.

mcalinghee · 2025-08-22T07:07:15Z

Not able to reproduce the issue here by running COMPLEMENT_DIR=../complement POSTGRES=1 WORKERS=1 ./scripts-dev/complement.sh
Probably missing something here.

MadLittleMods · 2025-09-11T19:22:51Z

synapse/api/constants.py

+# This is defined in the Matrix spec and enforced by the receiver.
+MAX_EDUS_PER_TRANSACTION = 100
+# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes
+SYNAPSE_EDUS_PER_TRANSACTION = 10


Suggested change

SYNAPSE_EDUS_PER_TRANSACTION = 10

NUMBER_OF_RESERVED_EDUS_PER_TRANSACTION = 10

MadLittleMods · 2025-09-11T19:23:29Z

synapse/api/constants.py

+
+# This is defined in the Matrix spec and enforced by the receiver.
+MAX_EDUS_PER_TRANSACTION = 100
+# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes


Suggested change

# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes

# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes

# like trickling out some device list updates.

MadLittleMods · 2025-09-11T19:26:08Z

synapse/handlers/devicemessage.py

+    sender_user_id: str,
+    message_type: str,
+    messages: Dict[str, Dict[str, JsonDict]],
+    context: Dict[str, Any],


Suggested change

context: Dict[str, Any],

tracing_context: Dict[str, Any],

MadLittleMods · 2025-09-11T19:28:45Z

synapse/handlers/devicemessage.py

+    sender_user_id: str,
+    message_type: str,
+    messages: Dict[str, Dict[str, JsonDict]],
+    context: Dict[str, Any],


Instead of passing this in, can we just call tracing_context = get_active_span_text_map() in the function?

MadLittleMods · 2025-09-11T19:35:51Z

synapse/handlers/devicemessage.py

+    sender_user_id: str,
+    message_type: str,
+    context: Dict[str, Any],
+    message_id: str = random_string(16),


I don't think this will work correctly and I'm surprised this isn't causing a lint to be triggered (we had a lint for this before). I guess this is something we need to re-enable -> https://docs.astral.sh/ruff/rules/function-call-in-default-argument/

random_string(16) will only be executed once and be shared across all messages. We should also have a test that catches this as well.

MadLittleMods · 2025-09-11T19:39:02Z

synapse/handlers/devicemessage.py

+            edu_contents = get_device_message_edu_contents(
+                sender_user_id, message_type, messages, context
+            )
+            remote_edu_contents[destination] = edu_contents


I haven't really investigated the intricacies here but my leaning would be on the simple solution I suggested.

Performance wasn't the thing we're trying to address.

MadLittleMods · 2025-09-11T19:41:37Z

synapse/handlers/devicemessage.py

+            )
+
+        if len(current_edu_content["messages"]) > 0:
+            message_entry_size += 1  # Add 1 for the comma


Give an example.

This is also confused because we also state "add 1 for the comma per message" above. Which comma's? Is having both correct?

MadLittleMods · 2025-09-11T19:42:41Z

synapse/handlers/devicemessage.py

+
+    for recipient, message in messages.items():
+        # We remove 2 for the curly braces and add 1 for the colon
+        message_entry_size = len(encode_canonical_json({recipient: message})) - 2 + 1


The manual message serialization is a bit error prone and confusing to follow.

@erikjohnston's suggestion sounds good to me if you're willing to adapt

Limit to_device EDU size to 65536

61fa1b9

MatMaul force-pushed the edu-limit-size branch from 8add186 to 61fa1b9 Compare May 9, 2025 14:19

MatMaul marked this pull request as ready for review May 9, 2025 14:38

MatMaul requested a review from a team as a code owner May 9, 2025 14:38

MatMaul and others added 10 commits May 12, 2025 01:43

Increment to_device stream for each EDU otherwise we loose some

c80f24d

Simplify

eda00e1

Add comment

6627bed

Cosmetic

c6bc691

Cosmetic

57ab541

Add logs

a0e6dc3

Improve logs

9ce1488

fix bug

5e86c59

Improve logs

35d98b6

Merge remote-tracking branch 'origin/develop' into edu-limit-size

6be7bcc

MadLittleMods reviewed May 20, 2025

View reviewed changes

erikjohnston reviewed May 23, 2025

View reviewed changes

MadLittleMods changed the title ~~Limit to_device EDU size to 65536~~ Limit outgoing to_device EDU size to 65536 Jun 3, 2025

MadLittleMods added a commit that referenced this pull request Jun 19, 2025

Remove unrelated comment from reviewing another PR

ec16224

From my review of #18416

mcalinghee and others added 8 commits August 20, 2025 15:53

Apply suggestion from @MadLittleMods

a15a585

Co-authored-by: Eric Eastwood <[email protected]>

Apply suggestion from @MadLittleMods

e81e737

Co-authored-by: Eric Eastwood <[email protected]>

Apply suggestion from @MadLittleMods

1f9d243

Co-authored-by: Eric Eastwood <[email protected]>

Apply suggestion from @MadLittleMods

b503f97

Co-authored-by: Eric Eastwood <[email protected]>

add comment and add helper create_new_to_device_edu_content()

eef8132

Merge branch 'develop' into edu-limit-size

0e74f3e

add helper method and refactor EDUs constant

cf7441a

fix trial tests due to twisted lib issue

71a26a4

MadLittleMods reviewed Aug 20, 2025

View reviewed changes

try to fix test flaky issue

55afc88

MatMaul added 2 commits September 2, 2025 11:14

Merge remote-tracking branch 'origin/develop' into edu-limit-size

f234226

Merge remote-tracking branch 'origin/develop' into edu-limit-size

cb3bcb5

MatMaul force-pushed the edu-limit-size branch from 0277959 to cb3bcb5 Compare September 9, 2025 13:55

MadLittleMods reviewed Sep 11, 2025

View reviewed changes

	SYNAPSE_EDUS_PER_TRANSACTION = 10
	NUMBER_OF_RESERVED_EDUS_PER_TRANSACTION = 10

	# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes
	# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes
	# like trickling out some device list updates.

Limit outgoing to_device EDU size to 65536 #18416

Are you sure you want to change the base?

Limit outgoing to_device EDU size to 65536 #18416

Uh oh!

Conversation

MatMaul commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MatMaul May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcalinghee commented Aug 20, 2025

Uh oh!

MadLittleMods Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcalinghee commented Aug 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MatMaul commented May 9, 2025 •

edited

Loading

MatMaul May 23, 2025 •

edited

Loading

MadLittleMods Aug 20, 2025 •

edited

Loading