Skip to content

: stdio redirection #900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

shayne-fletcher
Copy link
Contributor

Differential Revision: D80366985

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 15, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80366985

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 15, 2025
Summary:



Rollback Plan:

Differential Revision: D80366985
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80366985

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 15, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`) as needed allowing the process to terminate and write logs normally as it does so.

Differential Revision: D80366985
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80366985

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 15, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`) as needed allowing the process to terminate and write logs normally as it does so.

Differential Revision: D80366985
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80366985

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 22, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80366985

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 22, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80366985

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 22, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80366985

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 22, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 22, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80366985

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 22, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 22, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 23, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80366985

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 23, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 23, 2025
Summary:

rust startup code (code that runs before `main`) changes the disposition for `SIGPIPE` such that it is silently ignored (that is, runs `signal(Signal::SIGPIPE, SigHandler::SigIgn)` or equivalent). this behavior introduced in 2014, is poorly documented but see rust-lang/rust#62569.

a task spawned in `hyperactor::signal_handler::GlobalSignalManager::new` creates an async signal listener using `signal-hook-tokio` crate. it watches for `SIGINT` and `SIGTERM` and on receiving one, executes cleanup code before removing the hooks and re-raising the signals in order to restore and execute the default behaviors (process termination). that signal handling code includes logging calls via `tracing::info!()` and `tracing::error!()`.

the problem is, if `SIGTERM` (say) is being handled by an orphan, the earlier death of the parent can mean the orphan's stdout/stderr pipes are closed. normally, writing to a closed pipe would result in signalling `SIGPIPE` and process termination but here a logging call results in an infinite uninterruptible sleep, hanging the process preventing it from shutting down.

this diff adds a call to a newly developed function `stdio_redirect::handle_broken_pipes()` which detects this condition and redirects stdio to a file (named derived from the process ID - e.g. `monarch-process-exit-3529266.log`). in our testing so far, this  overcomes hangs allowing processes to terminate and write logs normally as it does so.

this check will still race with pipe closure though so perhaps we should do something like this (e.g. redirect to `/dev/null` if not this) and avoid doing IO completely during signal handling?

Reviewed By: mariusae

Differential Revision: D80366985
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 0eb5e6c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants