[Postgres] Remove usage of pg_logical_slot_peek_binary_changes #390
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
For Postgres,
pg_logical_slot_peek_binary_changeswas used as a "health check" before starting replication, to confirm that we can actually read from the replication slot. For example, if the related publication was deleted, we'd get apublication "powersync" does not existerror (pg < 18 at least).However, there are are scenarios where the approach could add very high load on the source database. To explain this, we need some background on logical decoding in Postgres:
Postgres Logical Decoding
In Postgres, all changes are written to the WAL file. A global LSN is used to track position in the WAL file.
Each replication slot keeps track of the LSN of the last transaction that was replicated by the client (such as PowerSync).
When the client requests changes via logical replication, either via a streaming connection or via queries such as
pg_logical_slot_peek_binary_changes, a process called logical decoding decodes data from the WAL, filters it according to the publication configuration and client settings, and streams the decoded data to the client. In some cases, Postgres has to read through a lot of irrelevant data before it gets to the data relevant to the client. Some examples include:So what could happen is that a sequence of multiple gigabytes of data is written in the WAL, that are not replicated to the client (PowerSync Service). Now each time the
pg_logical_slot_peek_binary_changesquery runs, it attempts to scan through all of that data, but times out before it could find data to return to the client. Since it could not return data to the client, the client could not advance the slot’s LSN, causing it to repeatedly scan through the same data.Reproducing the issue
We can generate large amounts of WAL data using a script like this on the source database (can past in psql):
The actual update query is not important - the point is that we generate large volumes of data in the WAL (this one generates about 1GB of WAL data in the 40s limit in my test db), and then do a ROLLBACK to ensure the data is not actually replicated.
A couple of runs of the above while the PowerSync service is paused is enough to trigger a statement timeout on
pg_logical_slot_peek_binary_changes. That would cause an indefinite retry loop, adding large CPU and IOPS load on the source database.A further complication is that statements that timed out do not show up in
pg_stat_statements(the "slow query log" for Postgres), making the performance issue difficult to diagnose.The fix
The fix is essentially just removing the
pg_logical_slot_peek_binary_changesquery. Instead, we rely on two different mechanisms to check the slot health:wal_statuscolumn frompg_replication_slots(available in Postgres 13+) - this covers the typical cases likemax_slot_wal_keep_sizeexceeded. Support for this check was added in [Postgres] Improve handling of "lost" replication slots #387.publication "powersync" does not existas part of the actual logical replication stream.This should cover the same scenarios as before, but is more robust, and may reduce replication startup times in some cases.
Specifically, if there are multiple large reverted transactions in a sequence (the test case above):
pg_logical_slot_peek_binary_changeswould just time out.restart_lsn) would advance after scanning through each transaction, despite no actual data being streamed.Alternatives considered
I considered just changing how we use
pg_logical_slot_peek_binary_changes:However, it does not appear that this check actually adds value over just starting streaming, so I removed it rather.