Skip to content

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Sep 5, 2025

Spawning from #18871

This change was originally used to fix CPU time going backwards when we daemonize.

While, we don't seem to run into this problem on develop, I still think this is a good change to make. We don't need background tasks running on a process that will soon be forcefully exited and where the reactor isn't even running yet. We now kick off the background tasks (run_as_background_process) after we have forked the process and started the reactor.

Also as simple note, we don't need background tasks running in both halves of a fork.

Testing strategy:

  1. Run with daemonize: true in your homeserver.yaml
  2. poetry run synapse_homeserver --config-path homeserver.yaml
  3. Shutdown the server
  4. Look for any bad log entries in your homeserver logs:
    • utime went backwards!/stime went backwards!

Bad log examples from the other PR (to be clear I wasn't seeing this on develop or with this change):

synapse.logging.context - ERROR - _schedule_next_expiry-0 - utime went backwards! 0.050467 < 0.886526
synapse.logging.context - ERROR - _schedule_db_events-0 - stime went backwards! 0.009941 < 0.155106
synapse.logging.context - ERROR - wake_destinations_needing_catchup-0 - stime went backwards! 0.010175 < 0.130923
synapse.logging.context - ERROR - resume_sync_partial_state_room-0 - utime went backwards! 0.052898 < 0.886526

Dev notes

startup

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

When we `daemonize`, we fork the process and cputime metrics get confused
about the per-thread resource usage appearing to go backwards because we're
comparing the resource usage (`rusage`) from the original process to the
forked process.

We now kick off the background tasks (`run_as_background_process`) after we
have forked the process so the `rusage` we record when we `start` is in the
same thread when we `stop`.

Bad log examples from before:
```
synapse.logging.context - ERROR - _schedule_next_expiry-0 - utime went backwards! 0.050467 < 0.886526
synapse.logging.context - ERROR - _schedule_db_events-0 - stime went backwards! 0.009941 < 0.155106
synapse.logging.context - ERROR - wake_destinations_needing_catchup-0 - stime went backwards! 0.010175 < 0.130923
synapse.logging.context - ERROR - resume_sync_partial_state_room-0 - utime went backwards! 0.052898 < 0.886526
```

Testing strategy:

 1. Run with `daemonize: true` in your `homeserver.yaml`
 1. `poetry run synapse_homeserver --config-path homeserver.yaml`
 1. Shutdown the server
 1. Look for any bad log entries in your homeserver logs:
    - `Expected logging context sentinel but found main`
    - `Expected logging context main was lost`
    - `utime went backwards!`/`stime went backwards!`
Comment on lines -369 to -373
# Register background tasks required by this server. This must be done
# somewhat manually due to the background tasks not being registered
# unless handlers are instantiated.
if self.config.worker.run_background_tasks:
self.setup_background_tasks()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of where we moved this code from and to:

Relevant starting point is here:

def main() -> None:
with LoggingContext("main"):
# check base requirements
check_requirements()
hs = setup(sys.argv[1:])
# redirect stdio to the logs, if configured.
if not hs.config.logging.no_redirect_stdio:
redirect_stdio_to_logs()
run(hs)

The main thing to see here is setup() vs run(). We fork the process in run() and then start the reactor after.

Previously, we started the background tasks in the setup() phase.

Now we start the background tasks in start which happens "once the reactor is running".

"""

def setup_background_tasks(self) -> None:
def start_background_tasks(self) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a rename to align with it's new home in start(). I think either could fit but this might be more straightforward to understand.

Comment on lines +627 to 634
# TODO: This should be moved to same pattern we use for other background tasks:
# Add to `REQUIRED_ON_BACKGROUND_TASK_STARTUP` and rely on
# `start_background_tasks` to start it.
await hs.get_common_usage_metrics_manager().setup()

# TODO: This feels like another pattern that should refactored as one of the
# `REQUIRED_ON_BACKGROUND_TASK_STARTUP`
start_phone_stats_home(hs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are future refactors to do.

But given we're already starting a few snowflake background tasks in start(), this is just is more evidence why it makes sense to start background tasks here vs in setup()

Comment on lines +123 to +128
# This will cause all of the relevant storage classes to be instantiated and call
# `register_background_update_handler(...)`,
# `register_background_index_update(...)`,
# `register_background_validate_constraint(...)`, etc so they are available to use
# if we are asked to run those background updates.
hs.get_storage_controllers()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this was handled in an extremely tenuous fashion because previously hs.setup() used to call hs.start_background_tasks() which instantiated some handlers which ended up instantiating the storage controllers at some point.

Now we just explicitly call this

@MadLittleMods MadLittleMods marked this pull request as ready for review September 9, 2025 05:17
@MadLittleMods MadLittleMods requested a review from a team as a code owner September 9, 2025 05:17
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, definitely not a good thing to be running background tasks in both halves of a fork! :-)

@MadLittleMods MadLittleMods merged commit ca655e4 into develop Sep 9, 2025
44 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/start-background-tasks-after-daemonize branch September 9, 2025 15:10
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @reivilibre 🐿️

MadLittleMods added a commit that referenced this pull request Oct 3, 2025
This was originally removed in #18886
but it looks like it snuck back in #18828
during a bad merge
MadLittleMods added a commit that referenced this pull request Oct 7, 2025
This was originally removed in
#18886 but it looks like it
snuck back in #18828 during a
[bad
merge](4cd3d91).

Noticed while looking at Synapse setup and startup (just by happen
stance).

I don't think this has adverse effects on Synapse actually working and
`start_background_tasks()` can be called multiple times.


### Is there a good way to audit all of these merges?

As I would like to see the conflicts for each merge.

This works but it's still hard to notice anything is wrong:

```
git log --remerge-diff <commit-sha>
```

> shows the difference from mechanical merge result and the result that
is actually recorded in a merge commit

via
https://stackoverflow.com/questions/15277708/how-do-you-see-show-a-git-merge-conflict-resolution-that-was-done-given-a-mer/71181334#71181334

The following better. Specify the version range to the commit right
before the merge to the merge. And can even specify which file to look
at to make it more obvious with the hindsight we have now.

```
git log --remerge-diff <merge-commit-sha>~1..<merge-commit-sha> -- synapse/server.py
```

Example:
```
git log --remerge-diff 4cd3d91~1..4cd3d91 -- synapse/server.py
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants