-
Notifications
You must be signed in to change notification settings - Fork 396
Limit outgoing to_device EDU size to 65536 #18416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
edu_contents = get_device_message_edu_contents( | ||
sender_user_id, message_type, messages, context | ||
) | ||
remote_edu_contents[destination] = edu_contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of changing the structure of remote_edu_contents
(was a map from destination to EDU meta) (to a map from destination to multiple EDU meta), could we just call add_messages_to_device_inbox(...)
multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multi version should have some gain performance side, since it's in an unique transaction, and pre-allocate all the stream ids.
I am fine if we decide to keep it simple and sacrifice some perf for that, but I am not sure it's worth it here it's not overly complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MadLittleMods are you fine with keeping it like that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't really investigated the intricacies here but my leaning would be on the simple solution I suggested.
Performance wasn't the thing we're trying to address.
|
||
for recipient, message in messages.items(): | ||
# We remove 2 for the curly braces and add 1 for the colon | ||
message_entry_size = len(encode_canonical_json({recipient: message})) - 2 + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by thought: instead of trying to work out the lengths and calculate the number of messages we can add, it might be easier to just generate the EDU and then check the size of it. If its too big you half the number of messages and try again.
The common case will be that we don't need to split up the EDU, at the expense of duplicating some work. It feels a bit hacky, but I think might be a little less brittle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it will be that much simpler for comprehension TBH.
If we think we can eat the perf cost, the simpler is probably to call encode_canonical_json
on the whole EDU for each added message, and remove it and create a new EDU if it's larger than the max.
My calculation tricks were to avoid doing a full serialization on each added message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And add a special case tried first where we try to put everything in one ?
I don't know TBH, the idea of splitting in 2 is nice too but I feel like it is going to be quite annoying to implement and hence not simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erikjohnston @MadLittleMods thoughts on where we would like to go here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manual message serialization is a bit error prone and confusing to follow.
@erikjohnston's suggestion sounds good to me if you're willing to adapt
From my review of #18416
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Looks like the CI tests are flaky as mentioned here : #18537 |
# FIXME: Because huge log line is triggered in this test, | ||
# trial breaks, sometimes (flakily) failing the test run. | ||
# ref: https://github.com/twisted/twisted/issues/12482 | ||
# To remove this, we would need to fix the above issue and | ||
# update, including in olddeps (so several years' wait). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is still another case somewhere -> https://github.com/element-hq/synapse/actions/runs/17109765260/job/48528124654?pr=18416 (twisted.protocols.amp.TooLong
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this to track the case but I was not able to reproduce the issue on my side.
If you have an idea on how to track the issue, I would be interested.
In the meantime, I had a guess on what was the issue by looking at the code.
Not able to reproduce the issue here by running |
0277959
to
cb3bcb5
Compare
# This is defined in the Matrix spec and enforced by the receiver. | ||
MAX_EDUS_PER_TRANSACTION = 100 | ||
# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes | ||
SYNAPSE_EDUS_PER_TRANSACTION = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYNAPSE_EDUS_PER_TRANSACTION = 10 | |
NUMBER_OF_RESERVED_EDUS_PER_TRANSACTION = 10 |
|
||
# This is defined in the Matrix spec and enforced by the receiver. | ||
MAX_EDUS_PER_TRANSACTION = 100 | ||
# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes | |
# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes | |
# like trickling out some device list updates. |
sender_user_id: str, | ||
message_type: str, | ||
messages: Dict[str, Dict[str, JsonDict]], | ||
context: Dict[str, Any], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context: Dict[str, Any], | |
tracing_context: Dict[str, Any], |
sender_user_id: str, | ||
message_type: str, | ||
messages: Dict[str, Dict[str, JsonDict]], | ||
context: Dict[str, Any], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing this in, can we just call tracing_context = get_active_span_text_map()
in the function?
sender_user_id: str, | ||
message_type: str, | ||
context: Dict[str, Any], | ||
message_id: str = random_string(16), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work correctly and I'm surprised this isn't causing a lint to be triggered (we had a lint for this before). I guess this is something we need to re-enable -> https://docs.astral.sh/ruff/rules/function-call-in-default-argument/
random_string(16)
will only be executed once and be shared across all messages. We should also have a test that catches this as well.
edu_contents = get_device_message_edu_contents( | ||
sender_user_id, message_type, messages, context | ||
) | ||
remote_edu_contents[destination] = edu_contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't really investigated the intricacies here but my leaning would be on the simple solution I suggested.
Performance wasn't the thing we're trying to address.
) | ||
|
||
if len(current_edu_content["messages"]) > 0: | ||
message_entry_size += 1 # Add 1 for the comma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give an example.
This is also confused because we also state "add 1 for the comma per message" above. Which comma's? Is having both correct?
|
||
for recipient, message in messages.items(): | ||
# We remove 2 for the curly braces and add 1 for the colon | ||
message_entry_size = len(encode_canonical_json({recipient: message})) - 2 + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manual message serialization is a bit error prone and confusing to follow.
@erikjohnston's suggestion sounds good to me if you're willing to adapt
If a set of messages exceeds this limit, the messages are splitted across several EDUs.
Should fix #17035.
There is currently no official specced limit for EDUs, but the consensus seems to be that it would be useful to have one to avoid this bug by bounding the transaction size.
As a side effect it also limits the size of a single to-device message to a bit less than 65536.
This should probably be added to the spec similarly to the message size limit.
Pull Request Checklist