Skip to content

Conversation

@charliermarsh
Copy link
Member

Summary

See: #6724

Test Plan

(Needs testing.)

Copy link
Member

@zanieb zanieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, should we be forwarding all signals?

@zanieb zanieb added the bug Something isn't working label Aug 28, 2024
@charliermarsh
Copy link
Member Author

I don't think this forwards them per se, it just kills the process if we see them. Other signals might have different meanings, right...?

@zanieb
Copy link
Member

zanieb commented Aug 28, 2024

Oh.. hm. I'll need to look closer.

Comment on lines 216 to +220
// signal handlers after the command completes.
let _handler = tokio::spawn(async { while tokio::signal::ctrl_c().await.is_ok() {} });

let status = handle.wait().await.context("Child process disappeared")?;
let mut term_signal = signal(SignalKind::terminate())?;
let mut int_signal = signal(SignalKind::interrupt())?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to remove the above ctrl_c() handler if we're going to replicate it here, right?

@zanieb
Copy link
Member

zanieb commented Aug 28, 2024

Reading about this in https://unix.stackexchange.com/questions/176235/fork-and-how-signals-are-delivered-to-processes

Seems like both the child and parent should receive the signal already? That's why above we say:

// Ignore signals in the parent process, deferring them to the child.

And just eat all the SIGINT we receive in the parent process.

@zanieb
Copy link
Member

zanieb commented Aug 28, 2024

More interesting discussion in docker/compose#10898 (comment) and docker/cli#4402 (comment)

I think that when the CLI receives a signal which is customarily used to ask processes to gracefully terminate, the CLI should oblige; but it's been pointed out that if the CLI were to forward signals to its children (which would be the easiest solution), then plugins would receive two signals in cases where the signal is sent to the entire process group (such as when pressing Ctrl+C in the terminal) - which would break the existing and, as I understand it, desirable behaviour of plugins (shutdown gracefully on first signal, shutdown fast on second).

@zanieb
Copy link
Member

zanieb commented Aug 28, 2024

Basically — I think we should be forwarding all signals to the child process but we'll need an exception for SIGINT in an interactive session because it's probably already been sent to the child process by a shell. 😭

@overfl0
Copy link

overfl0 commented Aug 30, 2024

Hi @charliermarsh , I have tested this PR the same way as I stated in #6724 and sadly it doesn't seem to actually fix the issue 😬
The service still takes 10 seconds to stop, for some reason.

Maybe it's not about sigterm after all? I can't really say. All I know is that the service doesn't stop when you Ctrl+C the service started with docker compose

@charliermarsh
Copy link
Member Author

Thanks @overfl0! I appreciate you trying it out. I can also repro the issue, though hadn't had a chance to test out this fix.

@mvaled
Copy link

mvaled commented Sep 3, 2024

@charliermarsh

I wonder if it wouldn't be better to replace the current process completely (a la exec). That would remove the need of meddling the signals.

Disclaimer: I'm not sure the about the overall design of uv run.

I see that rye used this exec approach (not using tokio, though):

/// Spawns a command exec style.
pub fn exec_spawn(cmd: &mut Command) -> Result<Infallible, Error> {
    // this is technically only necessary on windows
    crate::disable_ctrlc_handler();

    #[cfg(unix)]
    {
        use std::os::unix::process::CommandExt;
        let err = cmd.exec();
        Err(err.into())
    }
    #[cfg(windows)]
    {
        cmd.stdin(Stdio::inherit());
        let status = cmd.status()?;
        std::process::exit(status.code().unwrap())
    }
}

What would be the pros/cons of not using Tokio's Command and use the stdlib?

@zanieb
Copy link
Member

zanieb commented Sep 3, 2024

That's also discussed briefly in #3095


// `SIGINT`
_ = int_signal.recv() => {
handle.kill().await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be important to preserve the signal kind, so still a SIGINT to the child?

@pkucmus
Copy link

pkucmus commented Sep 13, 2024

I was pointed here from another issue with something that might be related. So just to leave it here, did you consider how a Uvicorn (or other webservers) deployed application might be used with uv? In the context of signal control of the server that is. Sending SIGHUP, SIGTTIN, SIGTTOU to your container to scale Uvicorn to the container from it's orchestration layer like Docker Swarm has meaning as only the PID 1 process in the container will receive the signal, in the case of uv run uv is running as PID 1 and would need to propagate at least as few signals to it's subprocess webserver for it to for example gracefully shut down.

Here's some context: https://www.kaggle.com/code/residentmario/best-practices-for-propagating-signals-on-docker

Anyway I don't think it's something I can help with but wanted to point it out - maybe it's a separate issue, don't know how you guys would rather track that but I think it's an important thing to consider. Thanks.

@pkucmus
Copy link

pkucmus commented Sep 13, 2024

Maybe documenting this consideration would be a good idea?

@cr-klarna
Copy link

Hello all, wondering what's missing here for this to be folded into the next release? I see some checks are failing but also that it was approved. If this ensures some propagation it would be super helpful for my team to have as right now we see the same behavior in the original reference issue re: dockerized processes and we depend on being able to properly gracefully shutdown rather than have a forced kill after the sigterm wait period.

@zanieb
Copy link
Member

zanieb commented Oct 31, 2024

This is missing a design and subsequent changes to address my comments at #6738 (comment)

@overfl0
Copy link

overfl0 commented Oct 31, 2024

@zanieb I think what's more important is that this PR doesn't actually fix the issue, see my comment below yours: #6738 (comment)
No idea why it doesn't work, but the fact is that when I built and tested this PR, I still experienced the old behaviour.

@bluss
Copy link
Contributor

bluss commented Nov 1, 2024

Basically — I think we should be forwarding all signals to the child process but we'll need an exception for SIGINT in an interactive session because it's probably already been sent to the child process by a shell. 😭

jupyter-client also sends signals to the process group, so I'm worried about that one too if running the ipykernel using uv run. I'm available to test any proposed changes for that scenario though. jupyter-client tells ipykernel (on non-windows) about the user desire to stop current computation using SIGINT by default.

@cr-klarna
Copy link

@zanieb I think what's more important is that this PR doesn't actually fix the issue, see my comment below yours: #6738 (comment) No idea why it doesn't work, but the fact is that when I built and tested this PR, I still experienced the old behaviour.

took another look at this and I also confirmed that when sending a signal it was not passed to the underlying process as expected. I now see what @zanieb mentioned with some interactive shells passing signals directly when done from an interactive context but I'm still not convinced that's a big deal really. at this point I'd rather receive double the signals than the current behavior

@cr-klarna
Copy link

cr-klarna commented Nov 8, 2024

A few things after exploring where things are going wrong. Keep in mind I do not know rust and am only looking at it because this tool is written in it; may need some help to confirm my findings.

  1. Regarding why some of us see signals not being passed
    • this is because the change was applied only to the tool/run.rs file and is also needed in the project/run.rs file as well.
  2. Why we still see issues with process behavior when signals get handled
    • this seems to be because afaict this tokio library does not care about proper signal handling and only exposes a kill API method which does exactly that. Not great, but I see some workarounds around the rust community
  3. when point 2 is addressed this code still assumes too much on handling only one signal ever then exiting
    • and would need to change to have some sort of looping to readily handle more than one signal when initial signals do not result in child termination

I am slow in rust as mentioned but I have been looking at this with earnest, though I recognize someone that actually knows this language will be much faster to take them if indeed these are the right things to address

@shaneikennedy
Copy link
Contributor

@charliermarsh @zanieb sorry for the dup PR but didn't want to push to your branch (and not sure I could anyways) I added a few commits and I believe I have a fix ready here #8933, though I might need some guidance incase CI fails

@zanieb zanieb closed this Nov 12, 2024
zanieb pushed a commit that referenced this pull request Nov 12, 2024
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR builds off of #6738 to fix
#6724 (sorry for the new PR @charliermarsh I didn't want to push to your
branch, not even sure if I could). The reason the original PR doesn't
fix the issue described in #6724 is because the fastapi is ran in the
project context (as I assume a lot of use cases are). This PR adds an
extra commit to handle the signals in the project/run.rs file

~It also addresses the comment
[here](https://github.com/astral-sh/uv/pull/6738/files#r1734757548) to
not use the tokio ctrl-c method since we are now handling SIGINT
ourselves~ update, tokio handles SIGINT in a platform agnostic way,
intercepting this ouselves makes the logic more complicated with
windows, decided to leave the tokio ctrl-c handler

~[This
comment](https://github.com/astral-sh/uv/pull/6738/files#r1743510140)
remains unaddressed, however, the Child process does not have any other
methods besides kill() so I don't see how we can "preserve" the
interrupt call :/ I tried looking around but no luck.~ updated, this PR
is reduced to only handling SIGTERM propagation on unix machines, and
the sigterm call to the child is preserved by making use of the nix
package, instead of relying on tokio which only allowed for `kill()` on
a child process

## Test Plan

I tested this by building the docker container locally with these
changes and tagging it "myuv", and then using that as the base image in
uv-docker-example, (and ofc following the rest of the repro issues in
#6724. In my tests I see that ctrl-c in the docker-compose up command
exits the process almost immediately 👍

---------

Co-authored-by: Charlie Marsh <[email protected]>
zanieb added a commit that referenced this pull request Jan 28, 2025
There should be two functional changes here:

- If we receive SIGINT twice, forward it to the child process
- If the `uv run` child process changes its PGID, then forward SIGINT

Previously, we never forwarded SIGINT to a child process. Instead, we
relied on shell to do so.

On Windows, we still do nothing but eat the Ctrl-C events we receive.
I cannot see an easy way to send them to the child.

The motivation for these changes should be explained in the comments.

Closes #10952 (in which Ray
changes its PGID)
Replaces the (much simpler) #10989 with a more comprehensive approach.

See #6738 (comment)
for some previous context.
zanieb added a commit that referenced this pull request Apr 21, 2025
As I suspected quite some time ago
(#6738 (comment)),
it's problematic that we don't handle _every_ signal here. This PR adds
handling for all of the Unix signals except `SIGCHLD`, `SIGIO`, and
`SIGPOLL` which seem incorrect to forward. Also notable, we _cannot_
handle `SIGKILL` so if someone sends that to the PID instead of the
PGID, they will leave dangling subprocesses.

Instead, we could use `exec` and avoid this handling. However, we'd lose
the ability to add nice error message on failure (e.g., as someone is
trying to add in #12201) and, more
critically, we'd need to figure out how to clean up resources properly
(i.e., temporary directories) which currently happens on `Drop`. In the
long-term, we'll probably want an option to use `exec` — but we'll need
to figure out when to clean up resources or accept that they will
dangle. This was last discussed in
#3095 — discussion on that
approach should continue there.

A note on the implementation: I spent time time trying to write the
handler using a tokio stream, so we could dynamically iterate over a
list of signals instead of copy/pasting the implementation — I couldn't
get it to work though and it didn't seem critical.

Closes #12830
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants