Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Host dead after caches seem to break possibly due to multiple room delete commands in short period of time #10792

@jaywink

Description

@jaywink

Description

From the logs it looks like a host admin deleted a room multiple times in a row within a few minutes. Initially this caused 403's for message sending for a bot user, which makes sense, but this soon turned to 500's until restarted, after which a 403 was again returned. Also some time after the room delete activity, a sync started to throw long stacktraces with 500 error. This is unconfirmed to have been fixed after restart, but it seems likely given the stack mentions caches.

From logs:

  • Multiple events of the following events within a few minutes:
    • synapse.handlers.room - 1339 - INFO - DELETE-6639- Shutting down room '!roomid:domain.tld'
    • synapse.handlers.room - 1348 - INFO - DELETE-6639- Kicking '@botuser:domain.tld' from '!roomid:domain.tld'...
  • @botuser:domain.tld starts to see 403
  • Purge events seems to run multiple times for the rooms within a short period, with various events like this
    • synapse.storage.databases.main.purge_events - 412 - INFO - DELETE-6639- [purge] removing !roomid:domain.tld from event_push_summary
  • Room send 403 turns into 500:
> 2021-09-09 18:02:40,129 - synapse.http.server - 93 - ERROR - PUT-6693- Failed handle request via 'RoomSendEventRestServlet': <XForwardedForRequest at 0x7f978dff48b0 method='PUT' uri='/_matrix/client/r0/rooms/!roomid:domain.tld/send/m.room.message/m1627686538.10576?access_token=<redacted>' clientproto='HTTP/1.1' site='8008'>
> Sep 9, 2021 @ 18:02:40.132synapse
Traceback (most recent call last):
Sep 9, 2021 @ 18:02:40.132synapse
  File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
Sep 9, 2021 @ 18:02:40.132synapse
    result = current_context.run(gen.send, result)
Sep 9, 2021 @ 18:02:40.132synapse
  File "/usr/local/lib/python3.8/site-packages/synapse/rest/client/room.py", line 242, in on_POST
Sep 9, 2021 @ 18:02:40.132synapse
    ) = await self.event_creation_handler.create_and_send_nonmember_event(
Sep 9, 2021 @ 18:02:40.132synapse
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/message.py", line 863, in create_and_send_nonmember_event
Sep 9, 2021 @ 18:02:40.132synapse
    event, context = await self.create_event(
Sep 9, 2021 @ 18:02:40.132synapse
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/message.py", line 623, in create_event
Sep 9, 2021 @ 18:02:40.132synapse
    event, context = await self.create_new_client_event(
Sep 9, 2021 @ 18:02:40.132synapse
  File "/usr/local/lib/python3.8/site-packages/synapse/util/metrics.py", line 91, in measured_func
Sep 9, 2021 @ 18:02:40.132synapse
    r = await func(self, *args, **kwargs)
Sep 9, 2021 @ 18:02:40.132synapse
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/message.py", line 943, in create_new_client_event
Sep 9, 2021 @ 18:02:40.132synapse
    assert (
Sep 9, 2021 @ 18:02:40.132synapse
AssertionError: Attempting to create an event with no prev_events

The admin continues to create and delete rooms (scripted based on number of API calls and them happening over a short period of time for the same rooms). Later sync starts failing for the admin operating on behalf of another user:

2021-09-10 00:20:08,523 - synapse.access.http.8009 - 389 - INFO - GET-27763- 76.104.170.72 - 8009 - {@admin:domain.tld.@someuser:domain.tld} Processed request: 0.088sec/-0.000sec (0.064sec, 0.008sec) (0.045sec/0.162sec/14) 55B 500 "GET /_matrix/client/r0/sync?filter=1&timeout=0&_cacheBuster=1631233209635&access_token=<redacted> HTTP/1.1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36" [0 dbevts]

2021-09-10 00:20:15,977 - synapse.http.server - 93 - ERROR - GET-27765- Failed handle request via 'SyncRestServlet': <XForwardedForRequest at 0x7fcd04c930c0 method='GET' uri='/_matrix/client/r0/sync?filter=1&timeout=0&_cacheBuster=1631233217123&access_token=<redacted>' clientproto='HTTP/1.1' site=8009>
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
Traceback (most recent call last):
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1657, in _inlineCallbacks
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    result = current_context.run(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/twisted/python/failure.py", line 500, in throwExceptionIntoGenerator
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    return g.throw(self.type, self.value, self.tb)
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 362, in _wait_for_sync_for_user
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    result: SyncResult = await self.current_sync_for_user(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 406, in current_sync_for_user
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    sync_result = await self.generate_sync_result(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1072, in generate_sync_result
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    res = await self._generate_sync_entry_for_rooms(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1524, in _generate_sync_entry_for_rooms
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    await concurrently_execute(handle_room_entries, room_entries, 10)
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    result = current_context.run(gen.send, result)
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/util/async_helpers.py", line 191, in _concurrently_execute_inner
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    await maybe_awaitable(func(value))
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1512, in handle_room_entries
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    res = await self._generate_room_entry(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1936, in _generate_room_entry
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    batch = await self._load_filtered_recents(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 583, in _load_filtered_recents
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    loaded_recents = await filter_events_for_client(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/visibility.py", line 84, in filter_events_for_client
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    event_id_to_state = await storage.state.get_state_for_events(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/storage/state.py", line 466, in get_state_for_events
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    group_to_state = await self.stores.state._get_state_for_groups(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/state/store.py", line 237, in _get_state_for_groups
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    (member_state, incomplete_groups_m,) = self._get_state_for_groups_using_cache(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/state/store.py", line 301, in _get_state_for_groups_using_cache
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    state_dict_ids, got_all = self._get_state_for_group_using_cache(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/state/store.py", line 185, in _get_state_for_group_using_cache
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
    cache_entry = cache.get(group)
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/util/caches/dictionary_cache.py", line 96, in get
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
    return f(*args, **kwargs)
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/util/caches/lrucache.py", line 483, in cache_get
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
    move_node_to_front(node)
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/util/caches/lrucache.py", line 439, in move_node_to_front
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
    node.move_to_front(real_clock, list_root)
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/util/caches/lrucache.py", line 294, in move_to_front
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
    self._list_node.move_after(cache_list_root)
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
  File "/usr/local/lib/python3.8/site-packages/synapse/util/linked_list.py", line 93, in move_after
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
    assert self.prev_node
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
AssertionError

Until Synapse is restarted, the event send graph looks completely dead.

image

Restart seems to have made everything happy again (no more 500, a good old 403 for event sending as we should).

Not sure if the admin would still see a sync error, but looking at stacktrace it's probably fixed by nuking the caches on restart?

Version information

  • Homeserver: EMS customer, please ping for host name and other details
  • Version: v1.42.0
  • Install method: EMS docker images
  • Workers: Synchrotron + initial synchrotron

Metadata

Metadata

Assignees

No one assigned

    Labels

    S-MajorMajor functionality / product severely impaired, no satisfactory workaround.T-DefectBugs, crashes, hangs, security vulnerabilities, or other reported issues.z-p2(Deprecated Label)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions