-
Notifications
You must be signed in to change notification settings - Fork 839
Run: reap stray processes #6307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ephemeral COPR build failed. @containers/packit-build please check. |
run: after we've picked up the exit status of the "main" process that we're running, reap anything that was reparented to us before returning. Signed-off-by: Nalin Dahyabhai <[email protected]>
// we care about their exit status. | ||
logrus.Debugf("checking for reparented child processes") | ||
for range 100 { | ||
wpid, err := unix.Wait4(-1, nil, unix.WNOHANG, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe to do in about a process containing a ~million lines of Go code? I’m worried that it might, now or in the future, interfere with some library’s private helpers.
Is that prevented somehow?
(My first thought is that the parent that anything is reparented to should be a single-purpose small process … but that process could just exit and have everything reparented to the true PID 1, without individually waiting, couldn’t it? It’s very possible I’m missing something.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only be getting called in a subprocess that we've spun off to babysit the runtime, and which has set itself as a child subreaper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK.
break | ||
} | ||
if wpid == 0 { | ||
time.Sleep(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn’t this mean that the default execution of reapStrays
will wait for 100*100 ms = 10 whole seconds??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no child processes (the most likely case), the WNOHANG should tell Wait4() to return immediately with an ECHILD error, and we'll break out of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m sorry, my mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation LGTM, but I’m not sure why this is beneficial.
If the intermediate process exits without waiting, I read the kernel to mean the zombies will get reparented to init (or to some intermediate reaper), and that should generally be fine.
Is this because we might be running in an environment without a working init? Or because some intermediate parent might be setting PR_SET_CHILD_SUBREAPER
and deals with unexpected orphans badly?
(And if this is needed, should Podman have something similar?)
@@ -1129,6 +1147,7 @@ func runUsingRuntimeMain() { | |||
|
|||
// Run the container, start to finish. | |||
status, err := runUsingRuntime(options.Options, options.ConfigureNetwork, options.MoreCreateArgs, ospec, options.BundlePath, options.ContainerName, containerCreateW, containerStartR) | |||
reapStrays() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runUsingRuntime
just did this; does doing it again make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the new test was failing pretty regularly without one call or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's possible I'm misremembering this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… Is it possible for the “runtime” to exit immediately, while its children continue to be running? If so, the children could be running for an ~unbounded amount of time (or until we somehow enumerate + kill them I suppose), and there is no perfect time to run the reaping. (But, in production, if we run several build steps, we would reap them eventually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the "kill" invocation crashes and there's a process in the container that's left running, that's a definite possibility. In that case we'd hit the 10 second timeout (uh... twice) and give up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I LGTMed the PR already, so none of this is blocking)
I think there are ~three separate concerns here:
- Can this cleanup race vs. termination (or non-termination) of the indirect children, and if so, what to do in production?
- As you say, yes, this could happen if the runtime’s
kill
fails — but I think this is not worth worrying too much about: if the runtime is crashing, something like reimplementing the runtime as a fallback is, to me, not attractive at all: A lot of complex code that is very unlikely to run (and that might be crashing for the same reason the otherwise-working runtime is crashing?) = a lot of extra risk and very little added value.
- As you say, yes, this could happen if the runtime’s
- Assuming this cleanup can lose a race, does that mean we are adding a flaky test? That might be avoidable, and worth avoiding.
- Maybe the
crash
command could (run/bin/true
or something similar, lighter-weight than a shell, and) sleep for a few ms (or, uh, poll on/proc/$grandchildPid
? probably not), to have the grandchild exit beforecrash
kills itself, to ~ensure that at the point we are reaping during the test, the grandchild is very likely to be dead, and that we don’t run into the running-grandchild situation at test time. - If that’s the cause of the flake, I’d rather have a more complex test than two invocations of
reapStrays
in the production code.
- Maybe the
- (Tuning the heuristic of
reapStrays
. I’m sure there could be a lot of bikeshedding, but ultimately, if the process is not 100% reliable and a matter of engineering tradeoffs, and only relevant in a should-not-happen crash situation, I think the current code is just fine, and not something I think is worth trying to perfect.)
We had a report that a build running inside of a container was piling up unreaped processes when the runtime crashed, and in those cases pid 1 in the container is us. This fixes that part of it, at least, and the child process was already marking itself as a reaper, so in a sense it had already volunteered to take care of this.
Podman's going through conmon, which I think already handles this, but we don't use conmon. |
Thanks! LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: flouthoc, nalind The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When handling
buildah run
or a RUN instruction with an external runtime, after we've picked up the exit status of the "main" process that we're running, wait() for anything that was reparented to us before returning.How to verify it
New integration test!
Which issue(s) this PR fixes:
Special notes for your reviewer:
This doesn't fix "the runtime crashes" cases, but it fixes "there are zombie processes left over when the runtime crashes".
Does this PR introduce a user-facing change?