[Postgres] Remove usage of pg_logical_slot_peek_binary_changes #390

rkistner · 2025-11-10T14:59:11Z

Background

For Postgres, pg_logical_slot_peek_binary_changes was used as a "health check" before starting replication, to confirm that we can actually read from the replication slot. For example, if the related publication was deleted, we'd get a publication "powersync" does not exist error (pg < 18 at least).

However, there are are scenarios where the approach could add very high load on the source database. To explain this, we need some background on logical decoding in Postgres:

Postgres Logical Decoding

In Postgres, all changes are written to the WAL file. A global LSN is used to track position in the WAL file.
Each replication slot keeps track of the LSN of the last transaction that was replicated by the client (such as PowerSync).

When the client requests changes via logical replication, either via a streaming connection or via queries such as pg_logical_slot_peek_binary_changes, a process called logical decoding decodes data from the WAL, filters it according to the publication configuration and client settings, and streams the decoded data to the client. In some cases, Postgres has to read through a lot of irrelevant data before it gets to the data relevant to the client. Some examples include:

Transactions that were rolled back. These are persisted in the WAL, but not replicated to the client.
Changes to tables not part of the publication.
Changes filtered out by row filters in the publication (not often used with PowerSync).

So what could happen is that a sequence of multiple gigabytes of data is written in the WAL, that are not replicated to the client (PowerSync Service). Now each time the pg_logical_slot_peek_binary_changes query runs, it attempts to scan through all of that data, but times out before it could find data to return to the client. Since it could not return data to the client, the client could not advance the slot’s LSN, causing it to repeatedly scan through the same data.

Reproducing the issue

We can generate large amounts of WAL data using a script like this on the source database (can past in psql):

SET statement_timeout = 40000;
BEGIN;
DO $$
DECLARE i int;
BEGIN
  FOR i IN 1..200000 LOOP
    -- set name to a random string of +- 200kb, on all rows in `lists`
    UPDATE lists SET name = (SELECT string_agg(md5(random()::text), '') FROM generate_series(1, 6553));
  END LOOP;
END$$;
ROLLBACK;

The actual update query is not important - the point is that we generate large volumes of data in the WAL (this one generates about 1GB of WAL data in the 40s limit in my test db), and then do a ROLLBACK to ensure the data is not actually replicated.

A couple of runs of the above while the PowerSync service is paused is enough to trigger a statement timeout on pg_logical_slot_peek_binary_changes. That would cause an indefinite retry loop, adding large CPU and IOPS load on the source database.

A further complication is that statements that timed out do not show up in pg_stat_statements (the "slow query log" for Postgres), making the performance issue difficult to diagnose.

The fix

The fix is essentially just removing the pg_logical_slot_peek_binary_changes query. Instead, we rely on two different mechanisms to check the slot health:

The wal_status column from pg_replication_slots (available in Postgres 13+) - this covers the typical cases like max_slot_wal_keep_size exceeded. Support for this check was added in [Postgres] Improve handling of "lost" replication slots #387.
We check for errors like publication "powersync" does not exist as part of the actual logical replication stream.

This should cover the same scenarios as before, but is more robust, and may reduce replication startup times in some cases.

Specifically, if there are multiple large reverted transactions in a sequence (the test case above):

pg_logical_slot_peek_binary_changes would just time out.
Logical replication would get periodic messages on the replication stream while scanning through this.
Replication progress (the slot's restart_lsn) would advance after scanning through each transaction, despite no actual data being streamed.

Alternatives considered

I considered just changing how we use pg_logical_slot_peek_binary_changes:

Reduce the timeout to a shorter value, such as 5s.
If the timeout was reached, just log a warning and continue trying to replicate, instead of restarting the health check.

However, it does not appear that this check actually adds value over just starting streaming, so I removed it rather.

changeset-bot · 2025-11-10T14:59:15Z

🦋 Changeset detected

Latest commit: f7b752f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 11 packages

Name	Type
@powersync/service-module-postgres	Patch
@powersync/service-core	Patch
@powersync/service-image	Patch
@powersync/service-schema	Patch
@powersync/service-core-tests	Patch
@powersync/service-module-core	Patch
@powersync/service-module-mongodb-storage	Patch
@powersync/service-module-mongodb	Patch
@powersync/service-module-mysql	Patch
@powersync/service-module-postgres-storage	Patch
test-client	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

stevensJourney

That's an excellent PR description. This looks good to me.

rkistner added 4 commits November 10, 2025 15:35

Remove usage of pg_logical_slot_peek_binary_changes.

c68189e

More comments.

85bb599

Clean up error handling.

5d0daba

Add changeset.

f7b752f

rkistner requested a review from stevensJourney November 10, 2025 14:59

rkistner changed the title ~~Remove usage of pg_logical_slot_peek_binary_changes~~ [Postgres] Remove usage of pg_logical_slot_peek_binary_changes Nov 10, 2025

stevensJourney approved these changes Nov 11, 2025

View reviewed changes

rkistner merged commit 7eb7957 into main Nov 11, 2025
22 checks passed

rkistner deleted the pg-remove-peek-changes branch November 11, 2025 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Postgres] Remove usage of pg_logical_slot_peek_binary_changes #390

[Postgres] Remove usage of pg_logical_slot_peek_binary_changes #390

Uh oh!

rkistner commented Nov 10, 2025

Uh oh!

changeset-bot bot commented Nov 10, 2025

Uh oh!

stevensJourney left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Postgres] Remove usage of pg_logical_slot_peek_binary_changes #390

[Postgres] Remove usage of pg_logical_slot_peek_binary_changes #390

Uh oh!

Conversation

rkistner commented Nov 10, 2025

Background

Postgres Logical Decoding

Reproducing the issue

The fix

Alternatives considered

Uh oh!

changeset-bot bot commented Nov 10, 2025

🦋 Changeset detected

Uh oh!

stevensJourney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants