-
Notifications
You must be signed in to change notification settings - Fork 396
Start background tasks after we fork the process (daemonize) #18886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start background tasks after we fork the process (daemonize) #18886
Conversation
When we `daemonize`, we fork the process and cputime metrics get confused about the per-thread resource usage appearing to go backwards because we're comparing the resource usage (`rusage`) from the original process to the forked process. We now kick off the background tasks (`run_as_background_process`) after we have forked the process so the `rusage` we record when we `start` is in the same thread when we `stop`. Bad log examples from before: ``` synapse.logging.context - ERROR - _schedule_next_expiry-0 - utime went backwards! 0.050467 < 0.886526 synapse.logging.context - ERROR - _schedule_db_events-0 - stime went backwards! 0.009941 < 0.155106 synapse.logging.context - ERROR - wake_destinations_needing_catchup-0 - stime went backwards! 0.010175 < 0.130923 synapse.logging.context - ERROR - resume_sync_partial_state_room-0 - utime went backwards! 0.052898 < 0.886526 ``` Testing strategy: 1. Run with `daemonize: true` in your `homeserver.yaml` 1. `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Shutdown the server 1. Look for any bad log entries in your homeserver logs: - `Expected logging context sentinel but found main` - `Expected logging context main was lost` - `utime went backwards!`/`stime went backwards!`
# Register background tasks required by this server. This must be done | ||
# somewhat manually due to the background tasks not being registered | ||
# unless handlers are instantiated. | ||
if self.config.worker.run_background_tasks: | ||
self.setup_background_tasks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of where we moved this code from and to:
Relevant starting point is here:
synapse/synapse/app/homeserver.py
Lines 407 to 417 in b2997a8
def main() -> None: | |
with LoggingContext("main"): | |
# check base requirements | |
check_requirements() | |
hs = setup(sys.argv[1:]) | |
# redirect stdio to the logs, if configured. | |
if not hs.config.logging.no_redirect_stdio: | |
redirect_stdio_to_logs() | |
run(hs) |
The main thing to see here is setup()
vs run()
. We fork the process in run()
and then start the reactor after.
Previously, we started the background tasks in the setup()
phase.
Now we start the background tasks in start
which happens "once the reactor is running".
""" | ||
|
||
def setup_background_tasks(self) -> None: | ||
def start_background_tasks(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a rename to align with it's new home in start()
. I think either could fit but this might be more straightforward to understand.
# TODO: This should be moved to same pattern we use for other background tasks: | ||
# Add to `REQUIRED_ON_BACKGROUND_TASK_STARTUP` and rely on | ||
# `start_background_tasks` to start it. | ||
await hs.get_common_usage_metrics_manager().setup() | ||
|
||
# TODO: This feels like another pattern that should refactored as one of the | ||
# `REQUIRED_ON_BACKGROUND_TASK_STARTUP` | ||
start_phone_stats_home(hs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are future refactors to do.
But given we're already starting a few snowflake background tasks in start()
, this is just is more evidence why it makes sense to start background tasks here vs in setup()
…base.py` so portdb can maybe finish See #18886 (comment)
…pse_database.py` so portdb can maybe finish" This reverts commit 11c39c5.
… registered are available
# This will cause all of the relevant storage classes to be instantiated and call | ||
# `register_background_update_handler(...)`, | ||
# `register_background_index_update(...)`, | ||
# `register_background_validate_constraint(...)`, etc so they are available to use | ||
# if we are asked to run those background updates. | ||
hs.get_storage_controllers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this was handled in an extremely tenuous fashion because previously hs.setup()
used to call hs.start_background_tasks()
which instantiated some handlers which ended up instantiating the storage controllers at some point.
Now we just explicitly call this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, definitely not a good thing to be running background tasks in both halves of a fork! :-)
Thanks for the review @reivilibre 🐿️ |
This was originally removed in #18886 but it looks like it snuck back in #18828 during a [bad merge](4cd3d91). Noticed while looking at Synapse setup and startup (just by happen stance). I don't think this has adverse effects on Synapse actually working and `start_background_tasks()` can be called multiple times. ### Is there a good way to audit all of these merges? As I would like to see the conflicts for each merge. This works but it's still hard to notice anything is wrong: ``` git log --remerge-diff <commit-sha> ``` > shows the difference from mechanical merge result and the result that is actually recorded in a merge commit via https://stackoverflow.com/questions/15277708/how-do-you-see-show-a-git-merge-conflict-resolution-that-was-done-given-a-mer/71181334#71181334 The following better. Specify the version range to the commit right before the merge to the merge. And can even specify which file to look at to make it more obvious with the hindsight we have now. ``` git log --remerge-diff <merge-commit-sha>~1..<merge-commit-sha> -- synapse/server.py ``` Example: ``` git log --remerge-diff 4cd3d91~1..4cd3d91 -- synapse/server.py ```
Spawning from #18871
This change was originally used to fix CPU time going backwards when we
daemonize
.While, we don't seem to run into this problem on
develop
, I still think this is a good change to make. We don't need background tasks running on a process that will soon be forcefully exited and where the reactor isn't even running yet. We now kick off the background tasks (run_as_background_process
) after we have forked the process and started the reactor.Also as simple note, we don't need background tasks running in both halves of a fork.
Testing strategy:
daemonize: true
in yourhomeserver.yaml
poetry run synapse_homeserver --config-path homeserver.yaml
utime went backwards!
/stime went backwards!
Bad log examples from the other PR (to be clear I wasn't seeing this on
develop
or with this change):Dev notes
startup
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.