Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Federation catchup stalls due to "concurrent access" exception thrown in the process_event_queue task #9635

@ShadowJonathan

Description

@ShadowJonathan
sender1_1        | 2021-03-16 20:00:14,270 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1        | 2021-03-16 20:00:14,335 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1        | 2021-03-16 20:00:14,380 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1        | 2021-03-16 20:00:14,414 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1        | 2021-03-16 20:00:14,451 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1        | 2021-03-16 20:00:14,510 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1        | 2021-03-16 20:00:14,771 - synapse.metrics.background_process_metrics - 210 - ERROR - process_event_queue_for_federation-19 - Background process 'process_event_queue_for_federation' threw an exception
sender1_1        | 2021-03-16 20:00:29,184 - synapse.federation.sender - 187 - DEBUG - process_event_queue_for_federation-20 - Handling 131486 -> 131586

My federation sender is in a simple loop: it tries to catch up with events, which it does by doing handle_event over batches of 100 events, which calls _send_pdu on every event, which calls store_destination_rooms_entries, which throws an exception due to could not serialize access due to concurrent update, process_event_queue_for_federation task dies, it's finally: does self._is_processing = False, and then another process event queue task can be started in notify_new_events (which starts from scratch)

Here's the error:

sender1_1        | 2021-03-16 20:02:25,657 - synapse.metrics.background_process_metrics - 210 - ERROR - process_event_queue_for_federation-24 - Background process 'process_event_queue_for_federation' threw an exception
sender1_1        | Traceback (most recent call last):
sender1_1        |   File "/usr/local/lib/python3.8/site-packages/synapse/metrics/background_process_metrics.py", line 208, in run
sender1_1        |     return await maybe_awaitable(func(*args, **kwargs))
sender1_1        |   File "/usr/local/lib/python3.8/site-packages/synapse/federation/sender/__init__.py", line 272, in _process_event_queue_loop
sender1_1        |     await make_deferred_yieldable(
sender1_1        | twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.SerializationFailure'>: could not serialize access due to concurrent update
sender1_1        | 
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:517:errback
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:580:_startRunCallbacks
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:662:_runCallbacks
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:1514:gotResult
sender1_1        | --- <exception caught here> ---
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:1443:_inlineCallbacks
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/python/failure.py:500:throwExceptionIntoGenerator
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/federation/sender/__init__.py:266:handle_room_events
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/federation/sender/__init__.py:254:handle_event
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/federation/sender/__init__.py:330:_send_pdu
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/transactions.py:315:store_destination_rooms_entries
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:660:runInteraction
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:743:runWithConnection
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py:238:inContext
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py:254:<lambda>
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/python/context.py:118:callWithContext
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/python/context.py:83:callWithContext
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:293:_runWithConnection
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/python/deprecate.py:298:deprecatedFunction
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/python/compat.py:403:reraise
sender1_1        | /usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:284:_runWithConnection
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:738:inner_func
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:532:new_transaction
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/transactions.py:347:_store_destination_rooms_entries_txn
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:275:execute_batch
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:320:_do_execute
sender1_1        | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:275:<lambda>
sender1_1        | /usr/local/lib/python3.8/site-packages/psycopg2/extras.py:1209:execute_batch
sender1_1        | ]]

This causes my federation sender to "stall", with transmission loops constantly sending events while (probably) updating the destination_room table, which causes the process_event_queue_for_federation task to "stall" and start to loop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    S-MinorBlocks non-critical functionality, workarounds exist.T-DefectBugs, crashes, hangs, security vulnerabilities, or other reported issues.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions