-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Federation catchup stalls due to "concurrent access" exception thrown in the process_event_queue task #9635
Description
sender1_1 | 2021-03-16 20:00:14,270 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1 | 2021-03-16 20:00:14,335 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1 | 2021-03-16 20:00:14,380 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1 | 2021-03-16 20:00:14,414 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1 | 2021-03-16 20:00:14,451 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1 | 2021-03-16 20:00:14,510 - synapse.storage.txn - 538 - WARNING - process_event_queue_for_federation-19 - [TXN OPERROR] {store_destination_rooms_entries-9e4f} could not serialize access due to concurrent update
sender1_1 | 2021-03-16 20:00:14,771 - synapse.metrics.background_process_metrics - 210 - ERROR - process_event_queue_for_federation-19 - Background process 'process_event_queue_for_federation' threw an exception
sender1_1 | 2021-03-16 20:00:29,184 - synapse.federation.sender - 187 - DEBUG - process_event_queue_for_federation-20 - Handling 131486 -> 131586
My federation sender is in a simple loop: it tries to catch up with events, which it does by doing handle_event
over batches of 100 events, which calls _send_pdu
on every event, which calls store_destination_rooms_entries
, which throws an exception due to could not serialize access due to concurrent update
, process_event_queue_for_federation
task dies, it's finally:
does self._is_processing = False
, and then another process event queue task can be started in notify_new_events
(which starts from scratch)
Here's the error:
sender1_1 | 2021-03-16 20:02:25,657 - synapse.metrics.background_process_metrics - 210 - ERROR - process_event_queue_for_federation-24 - Background process 'process_event_queue_for_federation' threw an exception
sender1_1 | Traceback (most recent call last):
sender1_1 | File "/usr/local/lib/python3.8/site-packages/synapse/metrics/background_process_metrics.py", line 208, in run
sender1_1 | return await maybe_awaitable(func(*args, **kwargs))
sender1_1 | File "/usr/local/lib/python3.8/site-packages/synapse/federation/sender/__init__.py", line 272, in _process_event_queue_loop
sender1_1 | await make_deferred_yieldable(
sender1_1 | twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.SerializationFailure'>: could not serialize access due to concurrent update
sender1_1 |
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:517:errback
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:580:_startRunCallbacks
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:662:_runCallbacks
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:1514:gotResult
sender1_1 | --- <exception caught here> ---
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:1443:_inlineCallbacks
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/python/failure.py:500:throwExceptionIntoGenerator
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/federation/sender/__init__.py:266:handle_room_events
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/federation/sender/__init__.py:254:handle_event
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/federation/sender/__init__.py:330:_send_pdu
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/transactions.py:315:store_destination_rooms_entries
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:660:runInteraction
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:743:runWithConnection
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py:238:inContext
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py:254:<lambda>
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/python/context.py:118:callWithContext
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/python/context.py:83:callWithContext
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:293:_runWithConnection
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/python/deprecate.py:298:deprecatedFunction
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/python/compat.py:403:reraise
sender1_1 | /usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:284:_runWithConnection
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:738:inner_func
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:532:new_transaction
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/transactions.py:347:_store_destination_rooms_entries_txn
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:275:execute_batch
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:320:_do_execute
sender1_1 | /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:275:<lambda>
sender1_1 | /usr/local/lib/python3.8/site-packages/psycopg2/extras.py:1209:execute_batch
sender1_1 | ]]
This causes my federation sender to "stall", with transmission loops constantly sending events while (probably) updating the destination_room
table, which causes the process_event_queue_for_federation
task to "stall" and start to loop.