-
-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Context
Show context (click to expand)
When a client creates a new message, it sends a new message with a timestamp & raw_time
set to True
:
// from LabChatModel.sendMessage()
const msg: IYmessage = {
type: 'msg',
id: UUID.uuid4(),
body: message.body,
time: Date.now() / 1000,
sender: this._user.username,
raw_time: true
};
Then, the server extension calls self._set_timestamp()
concurrently when a new message with raw_time=True
is received:
# from YChat._on_messages_change()
for idx in range(index, index + inserted_count):
message_dict = self._ymessages[idx]
if message_dict and message_dict.get("raw_time", True): # type:ignore[attr-defined]
self.create_task(self._set_timestamp(idx, timestamp))
This then deletes the message and re-adds it with raw_time
set to False
, while ensuring that the new message is inserted at the correct position:
async def _set_timestamp(self, msg_idx: int, timestamp: float):
with self._ydoc.transaction():
# Remove the message from the list and modify the timestamp
try:
message_dict = self._ymessages[msg_idx]
except IndexError:
return
message_dict["time"] = timestamp # type:ignore[index]
message_dict["raw_time"] = False # type:ignore[index]
self._ymessages[msg_idx] = message_dict
new_idx = len(self._ymessages) - next((i for i, v in enumerate(self._get_messages()[::-1]) if v["time"] < timestamp), len(self._ymessages))
if msg_idx != new_idx:
message_dict = self._ymessages.pop(msg_idx)
self._ymessages.insert(new_idx, message_dict)
Description
I think that this logic surrounding the raw_time
property is not necessary, and is doubling memory & disk use.
Why raw_time
may not be necessary
Each Yjs list is guaranteed to be eventually consistent. Each new item in a Yjs list is represented as a doubly-linked list node, that stores a reference to the previous & next items.
If some user A
sends a message, it sends a SyncUpdate message to the server extension, which then broadcasts that SyncUpdate to other clients. Therefore, it will take another user B
some latency L
to receive that update, e.g. 500 ms
.
After L
time passes, user B
will receive the new message from A
. Since new items in a list store references to the previous & next items in the list, any new message B
sends is guaranteed to follow the previous message from A
.
An edge case does occur when B
sends a new message within that latency L
, e.g. B
sends a new message just 5 ms
after A
sends a new message. When this occurs, Yjs will decide how the new messages are ordered based on their client ID & client clock, and the order is effectively arbitrary.
However, even in this case, I do not think we should re-order the messages, because it's impossible to know which client sent their message first. Different clients have different latencies, and we can't trust client-side timestamps unless we somehow implement NTP over WebSockets.
- Example: user
A
may live in the US and have a latency of50 ms
while userB
may live in Australia and have a latency of5000 ms
. If userB
sends a message att = 0 ms
, userA
can send a message att = 4000 ms
, and the server will receiveA
's message first, even thoughB
sent their message first.
Why raw_time
may be doubling memory & disk use
All Yjs documents are append-only. When we "delete" an item from a list (e.g. deleting a previous message), Yjs is just marking that item as "deleted" but keeping it in the YDoc. This is necessary to resolve edge cases where user A
deletes a message at the same time user B
adds a new one.
Therefore, deleting & re-adding every new user message will double memory & disk use, since every new message now produces 2 Yjs items: one with raw_time=True
(that gets quickly marked as deleted), and another with raw_time=False
(that persists and shows in the chat UI).
Also relates to #203.
Other quirks caused by raw_time
This also impacts the event listener since 2 new message insert events are created on every new message (one with raw_time=True
and another with raw_time=False
. See this review comment: https://github.com/jupyterlab/jupyter-ai/pull/1324/files/49aaa3908d8f755c2921944f28f3a3566035d960#r2046884004
Proposed Solution
- Drop
raw_time
fromMessage
. - Update
Message.time
to be optional, e.g.time: Optional[float] = None
. - Update
_set_timestamp()
to just setMessage.time
as soon as it receives a message, while not deleting & re-ordering messages.
After this change, timestamps may be out-of-order in rare cases. I don't think we have verified whether this will happen, so we should remove the existing logic to see what happens. If Yjs handles this well and timestamps are consistent 99.99% of the time, then we don't need to do anything. Otherwise, we can fix this later in a more performant way that doesn't double our memory & disk use.