feat: fastlane for phoenix presence_diff #1558

edgurgel · 2025-10-02T23:30:47Z

What kind of change does this PR introduce?

Support fastlane for phoenix presence_diff messages.

What is the current behavior?

presence_diff does not go through fastlane due to vanilla Phoenix.Presence using Phoenix.ChannelServer.dispatch/3 as dispatcher.

What is the new behavior?

Use a Phoenix fork with the option to pass a custom dispatcher to Phoenix.Presence. If we can get this PR merged we will just need to update Phoenix and ditch the fork

Another option is to copy and paste Phoenix.Presence and change the line that does the PubSub.local_broadcast

Additional context

Add any other context or screenshots.

coveralls · 2025-10-02T23:49:52Z

coverage: 85.477% (-0.03%) from 85.506%
when pulling cce558e on feat/phoenix-presence-dispatcher
into 07de665 on main.

It uses a fork of Phoenix for time being

* fix: runtime setup error (supabase#1520) * fix: use primary instead of replica on rename_settings_field (supabase#1521) * feat: upgrade cowboy & ranch (supabase#1523) * fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525) * fix: enable presence on track message (supabase#1527) currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them * fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530) cowboy 2.13.0 set the default active_n=1 * fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531) * feat: disable UTF8 validation on websocket frames (supabase#1532) Currently all text frames as handled only with JSON which already requires UTF-8 * fix: move DB setup to happen after Connect.init (supabase#1533) This change reduces the impact of slow DB setup impacting other tenants trying to connect at the same time that landed on the same partition * fix: handle wal bloat (supabase#1528) Verify that replication connection is able to reconnect when faced with WAL bloat issues * feat: replay realtime.messages (supabase#1526) A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast" The hardcoded limit is 25 for now. * feat: gen_rpc pub sub adapter (supabase#1529) Add a PubSub adapter that uses gen_rpc to send messages to other nodes. It uses :gen_rpc.abcast/3 instead of :erlang.send/2 The adapter works very similarly to the PG2 adapter. It consists of multiple workers that forward to the local node using PubSub.local_broadcast. The way to choose the worker to be used is based on the sending process just like PG2 adapter does The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`. This distinction exists because Phoenix.PubSub uses `:pool_size` to define how many partitions the PubSub registry will use. It's possible to control them separately by using `:broadcast_pool_size` * fix: ensure message id doesn't raise on non-map payloads (supabase#1534) * fix: match error on Connect (supabase#1536) --------- Co-authored-by: Eduardo Gurgel Pinho <[email protected]> * feat: websocket max heap size configuration (supabase#1538) * fix: set max process heap size to 500MB instead of 8GB * feat: set websocket transport max heap size WEBSOCKET_MAX_HEAP_SIZE can be used to configure it * fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537) Issues: * Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time * Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node. * fix: improve ErlSysMon logging for processes (supabase#1540) Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size Also bump long_schedule and long_gc * fix: make pubsub adapter configurable (supabase#1539) * fix: specify that only private channels are allowed when replaying (supabase#1543) messages * fix: rate limit connect module (supabase#1541) On bad connection, we rate limit the Connect module so we prevent abuses and too much logging of errors * build: automatically cancel old tests/build on new push (supabase#1545) Currently, whenever you push any commit to your branch, the old builds are still running and a new build is started. Once a new commit is added, the old test results no longer matter and it's just a waste of CI resources. Also reduces confusion with multiple builds running in parallel for the same branch/possibly blocking any merges. With this little change, we ensure that whenever a new commit is added, the previous build is immediately canceled/stopped and only the build (latest commit) runs. * fix: move message queue data to off-heap for gen_rpc pub sub workers (supabase#1548) * fix: rate limit Connect.lookup_or_start_connection on error only (supabase#1549) * fix: increase connect error rate window to 30 seconds (supabase#1550) * fix: set a lower fullsweep_after flag for GenRpcPubSub workers (supabase#1551) * fix: hardcode presence limit (supabase#1552) * fix: further decrease limit on presence events (supabase#1553) * fix: bump up realtime (supabase#1554) * fix: lower rate limit to 100 events per second (supabase#1556) * fix: move connect rate limit to socket (supabase#1555) * fix: reduce max_frame_size to 5MB * fix: fullsweep_after=100 on gen rpc pub sub workers --------- Co-authored-by: Eduardo Gurgel Pinho <[email protected]> * fix: collect global metrics without tenant tagging (supabase#1557) * feat: presence payload size (supabase#1559) * Also tweak buckets to account all the way to 3000KB * Start tagging the payload size metrics with message_type. message_type can be presence, broadcast or postgres_changes * fix: use GenRpc for Realtime.Latency pings (supabase#1560) * Fastlane for phoenix presence_diff (supabase#1558) It uses a fork of Phoenix for time being * fix: count presence_diff events on MessageDispatcher * fix: remove traces from console during development --------- Co-authored-by: Filipe Cabaço <[email protected]> Co-authored-by: Eduardo Gurgel <[email protected]> Co-authored-by: Kevin Grüneberg <[email protected]> Co-authored-by: Bradley Haljendi <[email protected]>

@h0lybyte

* 🔄 Sync with upstream changes (#2) * chore: fix couple of flaky tests (supabase#1517) * fix: Improve runtime setup logic (supabase#1511) Cleanup runtime.exs logic to be more organized and easier to mantain * fix: runtime setup error (supabase#1520) --------- Co-authored-by: Eduardo Gurgel <[email protected]> Co-authored-by: Filipe Cabaço <[email protected]> * 🔄 Sync with upstream changes (#4) * fix: runtime setup error (supabase#1520) * fix: use primary instead of replica on rename_settings_field (supabase#1521) --------- Co-authored-by: Filipe Cabaço <[email protected]> Co-authored-by: Eduardo Gurgel <[email protected]> Co-authored-by: Bradley Haljendi <[email protected]> * 🔄 Sync with upstream changes (#6) * fix: runtime setup error (supabase#1520) * fix: use primary instead of replica on rename_settings_field (supabase#1521) --------- Co-authored-by: Filipe Cabaço <[email protected]> Co-authored-by: Eduardo Gurgel <[email protected]> * 🔄 Sync with upstream changes (#7) * fix: runtime setup error (supabase#1520) * fix: use primary instead of replica on rename_settings_field (supabase#1521) * feat: upgrade cowboy & ranch (supabase#1523) * fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525) * fix: enable presence on track message (supabase#1527) currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them * fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530) cowboy 2.13.0 set the default active_n=1 * fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531) * feat: disable UTF8 validation on websocket frames (supabase#1532) Currently all text frames as handled only with JSON which already requires UTF-8 * fix: move DB setup to happen after Connect.init (supabase#1533) This change reduces the impact of slow DB setup impacting other tenants trying to connect at the same time that landed on the same partition * fix: handle wal bloat (supabase#1528) Verify that replication connection is able to reconnect when faced with WAL bloat issues * feat: replay realtime.messages (supabase#1526) A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast" The hardcoded limit is 25 for now. * feat: gen_rpc pub sub adapter (supabase#1529) Add a PubSub adapter that uses gen_rpc to send messages to other nodes. It uses :gen_rpc.abcast/3 instead of :erlang.send/2 The adapter works very similarly to the PG2 adapter. It consists of multiple workers that forward to the local node using PubSub.local_broadcast. The way to choose the worker to be used is based on the sending process just like PG2 adapter does The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`. This distinction exists because Phoenix.PubSub uses `:pool_size` to define how many partitions the PubSub registry will use. It's possible to control them separately by using `:broadcast_pool_size` * fix: ensure message id doesn't raise on non-map payloads (supabase#1534) * fix: match error on Connect (supabase#1536) --------- Co-authored-by: Eduardo Gurgel Pinho <[email protected]> * feat: websocket max heap size configuration (supabase#1538) * fix: set max process heap size to 500MB instead of 8GB * feat: set websocket transport max heap size WEBSOCKET_MAX_HEAP_SIZE can be used to configure it * fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537) Issues: * Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time * Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node. * fix: improve ErlSysMon logging for processes (supabase#1540) Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size Also bump long_schedule and long_gc * fix: make pubsub adapter configurable (supabase#1539) --------- Co-authored-by: Filipe Cabaço <[email protected]> Co-authored-by: Eduardo Gurgel <[email protected]> Co-authored-by: Bradley Haljendi <[email protected]> * 🔄 Sync with upstream changes (#9) * fix: runtime setup error (supabase#1520) * fix: use primary instead of replica on rename_settings_field (supabase#1521) * feat: upgrade cowboy & ranch (supabase#1523) * fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525) * fix: enable presence on track message (supabase#1527) currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them * fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530) cowboy 2.13.0 set the default active_n=1 * fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531) * feat: disable UTF8 validation on websocket frames (supabase#1532) Currently all text frames as handled only with JSON which already requires UTF-8 * fix: move DB setup to happen after Connect.init (supabase#1533) This change reduces the impact of slow DB setup impacting other tenants trying to connect at the same time that landed on the same partition * fix: handle wal bloat (supabase#1528) Verify that replication connection is able to reconnect when faced with WAL bloat issues * feat: replay realtime.messages (supabase#1526) A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast" The hardcoded limit is 25 for now. * feat: gen_rpc pub sub adapter (supabase#1529) Add a PubSub adapter that uses gen_rpc to send messages to other nodes. It uses :gen_rpc.abcast/3 instead of :erlang.send/2 The adapter works very similarly to the PG2 adapter. It consists of multiple workers that forward to the local node using PubSub.local_broadcast. The way to choose the worker to be used is based on the sending process just like PG2 adapter does The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`. This distinction exists because Phoenix.PubSub uses `:pool_size` to define how many partitions the PubSub registry will use. It's possible to control them separately by using `:broadcast_pool_size` * fix: ensure message id doesn't raise on non-map payloads (supabase#1534) * fix: match error on Connect (supabase#1536) --------- Co-authored-by: Eduardo Gurgel Pinho <[email protected]> * feat: websocket max heap size configuration (supabase#1538) * fix: set max process heap size to 500MB instead of 8GB * feat: set websocket transport max heap size WEBSOCKET_MAX_HEAP_SIZE can be used to configure it * fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537) Issues: * Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time * Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node. * fix: improve ErlSysMon logging for processes (supabase#1540) Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size Also bump long_schedule and long_gc * fix: make pubsub adapter configurable (supabase#1539) --------- Co-authored-by: Filipe Cabaço <[email protected]> Co-authored-by: Eduardo Gurgel <[email protected]> Co-authored-by: Bradley Haljendi <[email protected]> * 🔄 Sync with upstream changes (#11) * fix: runtime setup error (supabase#1520) * fix: use primary instead of replica on rename_settings_field (supabase#1521) * feat: upgrade cowboy & ranch (supabase#1523) * fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525) * fix: enable presence on track message (supabase#1527) currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them * fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530) cowboy 2.13.0 set the default active_n=1 * fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531) * feat: disable UTF8 validation on websocket frames (supabase#1532) Currently all text frames as handled only with JSON which already requires UTF-8 * fix: move DB setup to happen after Connect.init (supabase#1533) This change reduces the impact of slow DB setup impacting other tenants trying to connect at the same time that landed on the same partition * fix: handle wal bloat (supabase#1528) Verify that replication connection is able to reconnect when faced with WAL bloat issues * feat: replay realtime.messages (supabase#1526) A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast" The hardcoded limit is 25 for now. * feat: gen_rpc pub sub adapter (supabase#1529) Add a PubSub adapter that uses gen_rpc to send messages to other nodes. It uses :gen_rpc.abcast/3 instead of :erlang.send/2 The adapter works very similarly to the PG2 adapter. It consists of multiple workers that forward to the local node using PubSub.local_broadcast. The way to choose the worker to be used is based on the sending process just like PG2 adapter does The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`. This distinction exists because Phoenix.PubSub uses `:pool_size` to define how many partitions the PubSub registry will use. It's possible to control them separately by using `:broadcast_pool_size` * fix: ensure message id doesn't raise on non-map payloads (supabase#1534) * fix: match error on Connect (supabase#1536) --------- Co-authored-by: Eduardo Gurgel Pinho <[email protected]> * feat: websocket max heap size configuration (supabase#1538) * fix: set max process heap size to 500MB instead of 8GB * feat: set websocket transport max heap size WEBSOCKET_MAX_HEAP_SIZE can be used to configure it * fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537) Issues: * Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time * Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node. * fix: improve ErlSysMon logging for processes (supabase#1540) Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size Also bump long_schedule and long_gc * fix: make pubsub adapter configurable (supabase#1539) * fix: specify that only private channels are allowed when replaying (supabase#1543) messages * fix: rate limit connect module (supabase#1541) On bad connection, we rate limit the Connect module so we prevent abuses and too much logging of errors --------- Co-authored-by: Filipe Cabaço <[email protected]> Co-authored-by: Eduardo Gurgel <[email protected]> Co-authored-by: Bradley Haljendi <[email protected]> * 🔄 Sync with upstream changes (#13) * fix: runtime setup error (supabase#1520) * fix: use primary instead of replica on rename_settings_field (supabase#1521) * feat: upgrade cowboy & ranch (supabase#1523) * fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525) * fix: enable presence on track message (supabase#1527) currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them * fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530) cowboy 2.13.0 set the default active_n=1 * fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531) * feat: disable UTF8 validation on websocket frames (supabase#1532) Currently all text frames as handled only with JSON which already requires UTF-8 * fix: move DB setup to happen after Connect.init (supabase#1533) This change reduces the impact of slow DB setup impacting other tenants trying to connect at the same time that landed on the same partition * fix: handle wal bloat (supabase#1528) Verify that replication connection is able to reconnect when faced with WAL bloat issues * feat: replay realtime.messages (supabase#1526) A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast" The hardcoded limit is 25 for now. * feat: gen_rpc pub sub adapter (supabase#1529) Add a PubSub adapter that uses gen_rpc to send messages to other nodes. It uses :gen_rpc.abcast/3 instead of :erlang.send/2 The adapter works very similarly to the PG2 adapter. It consists of multiple workers that forward to the local node using PubSub.local_broadcast. The way to choose the worker to be used is based on the sending process just like PG2 adapter does The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`. This distinction exists because Phoenix.PubSub uses `:pool_size` to define how many partitions the PubSub registry will use. It's possible to control them separately by using `:broadcast_pool_size` * fix: ensure message id doesn't raise on non-map payloads (supabase#1534) * fix: match error on Connect (supabase#1536) --------- Co-authored-by: Eduardo Gurgel Pinho <[email protected]> * feat: websocket max heap size configuration (supabase#1538) * fix: set max process heap size to 500MB instead of 8GB * feat: set websocket transport max heap size WEBSOCKET_MAX_HEAP_SIZE can be used to configure it * fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537) Issues: * Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time * Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node. * fix: improve ErlSysMon logging for processes (supabase#1540) Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size Also bump long_schedule and long_gc * fix: make pubsub adapter configurable (supabase#1539) * fix: specify that only private channels are allowed when replaying (supabase#1543) messages * fix: rate limit connect module (supabase#1541) On bad connection, we rate limit the Connect module so we prevent abuses and too much logging of errors * build: automatically cancel old tests/build on new push (supabase#1545) Currently, whenever you push any commit to your branch, the old builds are still running and a new build is started. Once a new commit is added, the old test results no longer matter and it's just a waste of CI resources. Also reduces confusion with multiple builds running in parallel for the same branch/possibly blocking any merges. With this little change, we ensure that whenever a new commit is added, the previous build is immediately canceled/stopped and only the build (latest commit) runs. * fix: move message queue data to off-heap for gen_rpc pub sub workers (supabase#1548) * fix: rate limit Connect.lookup_or_start_connection on error only (supabase#1549) * fix: increase connect error rate window to 30 seconds (supabase#1550) * fix: set a lower fullsweep_after flag for GenRpcPubSub workers (supabase#1551) * fix: hardcode presence limit (supabase#1552) * fix: further decrease limit on presence events (supabase#1553) * fix: bump up realtime (supabase#1554) * fix: lower rate limit to 100 events per second (supabase#1556) * fix: move connect rate limit to socket (supabase#1555) * fix: reduce max_frame_size to 5MB * fix: fullsweep_after=100 on gen rpc pub sub workers --------- Co-authored-by: Eduardo Gurgel Pinho <[email protected]> * fix: collect global metrics without tenant tagging (supabase#1557) * feat: presence payload size (supabase#1559) * Also tweak buckets to account all the way to 3000KB * Start tagging the payload size metrics with message_type. message_type can be presence, broadcast or postgres_changes * fix: use GenRpc for Realtime.Latency pings (supabase#1560) * Fastlane for phoenix presence_diff (supabase#1558) It uses a fork of Phoenix for time being * fix: count presence_diff events on MessageDispatcher * fix: remove traces from console during development --------- Co-authored-by: Filipe Cabaço <[email protected]> Co-authored-by: Eduardo Gurgel <[email protected]> Co-authored-by: Kevin Grüneberg <[email protected]> Co-authored-by: Bradley Haljendi <[email protected]> --------- Co-authored-by: Al @h0lybyte <[email protected]> Co-authored-by: Eduardo Gurgel <[email protected]> Co-authored-by: Filipe Cabaço <[email protected]> Co-authored-by: Bradley Haljendi <[email protected]> Co-authored-by: Kevin Grüneberg <[email protected]>

filipecabaco approved these changes Oct 6, 2025

View reviewed changes

edgurgel added 4 commits October 7, 2025 09:37

feat: use custom message dispatcher for presence diff messages fastlane

b0e23ed

It uses a fork of Phoenix for time being

fix: count presence_diff events on MessageDispatcher

657cd1b

fix: remove traces from console during development

bb73eb4

fix: test

c624928

edgurgel force-pushed the feat/phoenix-presence-dispatcher branch from 3eed801 to c624928 Compare October 6, 2025 20:46

fixup! fix: count presence_diff events on MessageDispatcher

cce558e

edgurgel merged commit ecac071 into main Oct 6, 2025
5 of 7 checks passed

edgurgel deleted the feat/phoenix-presence-dispatcher branch October 6, 2025 23:21

edgurgel changed the title ~~Fastlane for phoenix presence_diff~~ feat: fastlane for phoenix presence_diff Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: fastlane for phoenix presence_diff #1558

feat: fastlane for phoenix presence_diff #1558

Uh oh!

edgurgel commented Oct 2, 2025

Uh oh!

coveralls commented Oct 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat: fastlane for phoenix presence_diff #1558

feat: fastlane for phoenix presence_diff #1558

Uh oh!

Conversation

edgurgel commented Oct 2, 2025

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Additional context

Uh oh!

coveralls commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Oct 2, 2025 •

edited

Loading