Propagate SIGTERM to `uv run` #6738

charliermarsh · 2024-08-28T02:48:53Z

Summary

Test Plan

(Needs testing.)

zanieb

Makes sense, should we be forwarding all signals?

charliermarsh · 2024-08-28T12:51:50Z

I don't think this forwards them per se, it just kills the process if we see them. Other signals might have different meanings, right...?

zanieb · 2024-08-28T13:07:08Z

Oh.. hm. I'll need to look closer.

zanieb · 2024-08-28T14:08:26Z

crates/uv/src/commands/tool/run.rs

    // signal handlers after the command completes.
    let _handler = tokio::spawn(async { while tokio::signal::ctrl_c().await.is_ok() {} });

-    let status = handle.wait().await.context("Child process disappeared")?;
+    let mut term_signal = signal(SignalKind::terminate())?;
+    let mut int_signal = signal(SignalKind::interrupt())?;


I think you need to remove the above ctrl_c() handler if we're going to replicate it here, right?

zanieb · 2024-08-28T14:09:45Z

Reading about this in https://unix.stackexchange.com/questions/176235/fork-and-how-signals-are-delivered-to-processes

Seems like both the child and parent should receive the signal already? That's why above we say:

// Ignore signals in the parent process, deferring them to the child.

And just eat all the SIGINT we receive in the parent process.

zanieb · 2024-08-28T14:14:00Z

More interesting discussion in docker/compose#10898 (comment) and docker/cli#4402 (comment)

I think that when the CLI receives a signal which is customarily used to ask processes to gracefully terminate, the CLI should oblige; but it's been pointed out that if the CLI were to forward signals to its children (which would be the easiest solution), then plugins would receive two signals in cases where the signal is sent to the entire process group (such as when pressing Ctrl+C in the terminal) - which would break the existing and, as I understand it, desirable behaviour of plugins (shutdown gracefully on first signal, shutdown fast on second).

zanieb · 2024-08-28T14:16:25Z

Basically — I think we should be forwarding all signals to the child process but we'll need an exception for SIGINT in an interactive session because it's probably already been sent to the child process by a shell. 😭

overfl0 · 2024-08-30T22:00:51Z

Hi @charliermarsh , I have tested this PR the same way as I stated in #6724 and sadly it doesn't seem to actually fix the issue 😬
The service still takes 10 seconds to stop, for some reason.

Maybe it's not about sigterm after all? I can't really say. All I know is that the service doesn't stop when you Ctrl+C the service started with docker compose

charliermarsh · 2024-08-30T22:09:59Z

Thanks @overfl0! I appreciate you trying it out. I can also repro the issue, though hadn't had a chance to test out this fix.

mvaled · 2024-09-03T07:31:28Z

@charliermarsh

I wonder if it wouldn't be better to replace the current process completely (a la exec). That would remove the need of meddling the signals.

Disclaimer: I'm not sure the about the overall design of uv run.

I see that rye used this exec approach (not using tokio, though):

/// Spawns a command exec style.
pub fn exec_spawn(cmd: &mut Command) -> Result<Infallible, Error> {
    // this is technically only necessary on windows
    crate::disable_ctrlc_handler();

    #[cfg(unix)]
    {
        use std::os::unix::process::CommandExt;
        let err = cmd.exec();
        Err(err.into())
    }
    #[cfg(windows)]
    {
        cmd.stdin(Stdio::inherit());
        let status = cmd.status()?;
        std::process::exit(status.code().unwrap())
    }
}

What would be the pros/cons of not using Tokio's Command and use the stdlib?

zanieb · 2024-09-03T17:36:24Z

That's also discussed briefly in #3095

bluss · 2024-09-04T10:40:16Z

crates/uv/src/commands/tool/run.rs

+
+        // `SIGINT`
+        _ = int_signal.recv() => {
+            handle.kill().await?;


Wouldn't it be important to preserve the signal kind, so still a SIGINT to the child?

pkucmus · 2024-09-13T07:55:26Z

I was pointed here from another issue with something that might be related. So just to leave it here, did you consider how a Uvicorn (or other webservers) deployed application might be used with uv? In the context of signal control of the server that is. Sending SIGHUP, SIGTTIN, SIGTTOU to your container to scale Uvicorn to the container from it's orchestration layer like Docker Swarm has meaning as only the PID 1 process in the container will receive the signal, in the case of uv run uv is running as PID 1 and would need to propagate at least as few signals to it's subprocess webserver for it to for example gracefully shut down.

Here's some context: https://www.kaggle.com/code/residentmario/best-practices-for-propagating-signals-on-docker

Anyway I don't think it's something I can help with but wanted to point it out - maybe it's a separate issue, don't know how you guys would rather track that but I think it's an important thing to consider. Thanks.

pkucmus · 2024-09-13T07:57:17Z

Maybe documenting this consideration would be a good idea?

cr-klarna · 2024-10-31T08:52:16Z

Hello all, wondering what's missing here for this to be folded into the next release? I see some checks are failing but also that it was approved. If this ensures some propagation it would be super helpful for my team to have as right now we see the same behavior in the original reference issue re: dockerized processes and we depend on being able to properly gracefully shutdown rather than have a forced kill after the sigterm wait period.

zanieb · 2024-10-31T13:24:45Z

This is missing a design and subsequent changes to address my comments at #6738 (comment)

overfl0 · 2024-10-31T13:46:13Z

@zanieb I think what's more important is that this PR doesn't actually fix the issue, see my comment below yours: #6738 (comment)
No idea why it doesn't work, but the fact is that when I built and tested this PR, I still experienced the old behaviour.

bluss · 2024-11-01T11:09:07Z

Basically — I think we should be forwarding all signals to the child process but we'll need an exception for SIGINT in an interactive session because it's probably already been sent to the child process by a shell. 😭

jupyter-client also sends signals to the process group, so I'm worried about that one too if running the ipykernel using uv run. I'm available to test any proposed changes for that scenario though. jupyter-client tells ipykernel (on non-windows) about the user desire to stop current computation using SIGINT by default.

cr-klarna · 2024-11-08T08:20:49Z

@zanieb I think what's more important is that this PR doesn't actually fix the issue, see my comment below yours: #6738 (comment) No idea why it doesn't work, but the fact is that when I built and tested this PR, I still experienced the old behaviour.

took another look at this and I also confirmed that when sending a signal it was not passed to the underlying process as expected. I now see what @zanieb mentioned with some interactive shells passing signals directly when done from an interactive context but I'm still not convinced that's a big deal really. at this point I'd rather receive double the signals than the current behavior

cr-klarna · 2024-11-08T09:36:00Z

A few things after exploring where things are going wrong. Keep in mind I do not know rust and am only looking at it because this tool is written in it; may need some help to confirm my findings.

Regarding why some of us see signals not being passed
- this is because the change was applied only to the tool/run.rs file and is also needed in the project/run.rs file as well.
Why we still see issues with process behavior when signals get handled
- this seems to be because afaict this tokio library does not care about proper signal handling and only exposes a kill API method which does exactly that. Not great, but I see some workarounds around the rust community
when point 2 is addressed this code still assumes too much on handling only one signal ever then exiting
- and would need to change to have some sort of looping to readily handle more than one signal when initial signals do not result in child termination

I am slow in rust as mentioned but I have been looking at this with earnest, though I recognize someone that actually knows this language will be much faster to take them if indeed these are the right things to address

shaneikennedy · 2024-11-08T11:46:32Z

@charliermarsh @zanieb sorry for the dup PR but didn't want to push to your branch (and not sure I could anyways) I added a few commits and I believe I have a fix ready here #8933, though I might need some guidance incase CI fails

@charliermarsh

## Summary This PR builds off of #6738 to fix #6724 (sorry for the new PR @charliermarsh I didn't want to push to your branch, not even sure if I could). The reason the original PR doesn't fix the issue described in #6724 is because the fastapi is ran in the project context (as I assume a lot of use cases are). This PR adds an extra commit to handle the signals in the project/run.rs file ~It also addresses the comment [here](https://github.com/astral-sh/uv/pull/6738/files#r1734757548) to not use the tokio ctrl-c method since we are now handling SIGINT ourselves~ update, tokio handles SIGINT in a platform agnostic way, intercepting this ouselves makes the logic more complicated with windows, decided to leave the tokio ctrl-c handler ~[This comment](https://github.com/astral-sh/uv/pull/6738/files#r1743510140) remains unaddressed, however, the Child process does not have any other methods besides kill() so I don't see how we can "preserve" the interrupt call :/ I tried looking around but no luck.~ updated, this PR is reduced to only handling SIGTERM propagation on unix machines, and the sigterm call to the child is preserved by making use of the nix package, instead of relying on tokio which only allowed for `kill()` on a child process ## Test Plan I tested this by building the docker container locally with these changes and tagging it "myuv", and then using that as the base image in uv-docker-example, (and ofc following the rest of the repro issues in #6724. In my tests I see that ctrl-c in the docker-compose up command exits the process almost immediately 👍 --------- Co-authored-by: Charlie Marsh <[email protected]>

There should be two functional changes here: - If we receive SIGINT twice, forward it to the child process - If the `uv run` child process changes its PGID, then forward SIGINT Previously, we never forwarded SIGINT to a child process. Instead, we relied on shell to do so. On Windows, we still do nothing but eat the Ctrl-C events we receive. I cannot see an easy way to send them to the child. The motivation for these changes should be explained in the comments. Closes #10952 (in which Ray changes its PGID) Replaces the (much simpler) #10989 with a more comprehensive approach. See #6738 (comment) for some previous context.

As I suspected quite some time ago (#6738 (comment)), it's problematic that we don't handle _every_ signal here. This PR adds handling for all of the Unix signals except `SIGCHLD`, `SIGIO`, and `SIGPOLL` which seem incorrect to forward. Also notable, we _cannot_ handle `SIGKILL` so if someone sends that to the PID instead of the PGID, they will leave dangling subprocesses. Instead, we could use `exec` and avoid this handling. However, we'd lose the ability to add nice error message on failure (e.g., as someone is trying to add in #12201) and, more critically, we'd need to figure out how to clean up resources properly (i.e., temporary directories) which currently happens on `Drop`. In the long-term, we'll probably want an option to use `exec` — but we'll need to figure out when to clean up resources or accept that they will dangle. This was last discussed in #3095 — discussion on that approach should continue there. A note on the implementation: I spent time time trying to write the handler using a tokio stream, so we could dynamically iterate over a list of signals instead of copy/pasting the implementation — I couldn't get it to work though and it didn't seem critical. Closes #12830

Propagate SIGTERM to uv run

50c1a97

zanieb approved these changes Aug 28, 2024

View reviewed changes

zanieb added the bug Something isn't working label Aug 28, 2024

zanieb reviewed Aug 28, 2024

View reviewed changes

bluss reviewed Sep 4, 2024

View reviewed changes

zanieb mentioned this pull request Sep 12, 2024

uv run spawns another process. #7343

Closed

zanieb mentioned this pull request Sep 17, 2024

support uv run invoked via subprocess #7449

Closed

zanieb mentioned this pull request Oct 29, 2024

uv run and SIGINT propagation #8654

Closed

shaneikennedy mentioned this pull request Nov 8, 2024

Handle sigterm calls, fixes #6724 #8933

Merged

zanieb closed this Nov 12, 2024

zanieb mentioned this pull request Jan 21, 2025

Child exit with signal n returns 128+n #10781

Merged

zanieb mentioned this pull request Jan 28, 2025

Improve SIGINT handling in uv run #11009

Merged

zanieb mentioned this pull request Apr 21, 2025

Forward additional signals to the child process in uv run #13017

Merged

Propagate SIGTERM to uv run #6738

Propagate SIGTERM to uv run #6738

Conversation

charliermarsh commented Aug 28, 2024

Summary

Test Plan

Uh oh!

zanieb left a comment

Choose a reason for hiding this comment

Uh oh!

charliermarsh commented Aug 28, 2024

Uh oh!

zanieb commented Aug 28, 2024

Uh oh!

zanieb Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

zanieb commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zanieb commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zanieb commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

overfl0 commented Aug 30, 2024

Uh oh!

charliermarsh commented Aug 30, 2024

Uh oh!

mvaled commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zanieb commented Sep 3, 2024

Uh oh!

bluss Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

pkucmus commented Sep 13, 2024

Uh oh!

pkucmus commented Sep 13, 2024

Uh oh!

cr-klarna commented Oct 31, 2024

Uh oh!

zanieb commented Oct 31, 2024

Uh oh!

overfl0 commented Oct 31, 2024

Uh oh!

bluss commented Nov 1, 2024

Uh oh!

cr-klarna commented Nov 8, 2024

Uh oh!

cr-klarna commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shaneikennedy commented Nov 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Propagate SIGTERM to `uv run` #6738

Propagate SIGTERM to `uv run` #6738

zanieb commented Aug 28, 2024 •

edited

Loading

zanieb commented Aug 28, 2024 •

edited

Loading

zanieb commented Aug 28, 2024 •

edited

Loading

mvaled commented Sep 3, 2024 •

edited

Loading

cr-klarna commented Nov 8, 2024 •

edited

Loading