Skip to content

Run: reap stray processes #6307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 7, 2025
Merged

Conversation

nalind
Copy link
Member

@nalind nalind commented Jul 31, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

When handling buildah run or a RUN instruction with an external runtime, after we've picked up the exit status of the "main" process that we're running, wait() for anything that was reparented to us before returning.

How to verify it

New integration test!

Which issue(s) this PR fixes:

Special notes for your reviewer:

This doesn't fix "the runtime crashes" cases, but it fixes "there are zombie processes left over when the runtime crashes".

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added kind/bug Categorizes issue or PR as related to a bug. approved labels Jul 31, 2025
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

run: after we've picked up the exit status of the "main" process that
we're running, reap anything that was reparented to us before returning.

Signed-off-by: Nalin Dahyabhai <[email protected]>
// we care about their exit status.
logrus.Debugf("checking for reparented child processes")
for range 100 {
wpid, err := unix.Wait4(-1, nil, unix.WNOHANG, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to do in about a process containing a ~million lines of Go code? I’m worried that it might, now or in the future, interfere with some library’s private helpers.

Is that prevented somehow?

(My first thought is that the parent that anything is reparented to should be a single-purpose small process … but that process could just exit and have everything reparented to the true PID 1, without individually waiting, couldn’t it? It’s very possible I’m missing something.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be getting called in a subprocess that we've spun off to babysit the runtime, and which has set itself as a child subreaper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

break
}
if wpid == 0 {
time.Sleep(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn’t this mean that the default execution of reapStrays will wait for 100*100 ms = 10 whole seconds??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no child processes (the most likely case), the WNOHANG should tell Wait4() to return immediately with an ECHILD error, and we'll break out of the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sorry, my mistake.

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation LGTM, but I’m not sure why this is beneficial.

If the intermediate process exits without waiting, I read the kernel to mean the zombies will get reparented to init (or to some intermediate reaper), and that should generally be fine.

Is this because we might be running in an environment without a working init? Or because some intermediate parent might be setting PR_SET_CHILD_SUBREAPER and deals with unexpected orphans badly?

(And if this is needed, should Podman have something similar?)

@@ -1129,6 +1147,7 @@ func runUsingRuntimeMain() {

// Run the container, start to finish.
status, err := runUsingRuntime(options.Options, options.ConfigureNetwork, options.MoreCreateArgs, ospec, options.BundlePath, options.ContainerName, containerCreateW, containerStartR)
reapStrays()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runUsingRuntime just did this; does doing it again make a difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the new test was failing pretty regularly without one call or the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's possible I'm misremembering this.

Copy link
Contributor

@mtrmac mtrmac Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… Is it possible for the “runtime” to exit immediately, while its children continue to be running? If so, the children could be running for an ~unbounded amount of time (or until we somehow enumerate + kill them I suppose), and there is no perfect time to run the reaping. (But, in production, if we run several build steps, we would reap them eventually).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the "kill" invocation crashes and there's a process in the container that's left running, that's a definite possibility. In that case we'd hit the 10 second timeout (uh... twice) and give up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I LGTMed the PR already, so none of this is blocking)

I think there are ~three separate concerns here:

  • Can this cleanup race vs. termination (or non-termination) of the indirect children, and if so, what to do in production?
    • As you say, yes, this could happen if the runtime’s kill fails — but I think this is not worth worrying too much about: if the runtime is crashing, something like reimplementing the runtime as a fallback is, to me, not attractive at all: A lot of complex code that is very unlikely to run (and that might be crashing for the same reason the otherwise-working runtime is crashing?) = a lot of extra risk and very little added value.
  • Assuming this cleanup can lose a race, does that mean we are adding a flaky test? That might be avoidable, and worth avoiding.
    • Maybe the crash command could (run /bin/true or something similar, lighter-weight than a shell, and) sleep for a few ms (or, uh, poll on /proc/$grandchildPid? probably not), to have the grandchild exit before crash kills itself, to ~ensure that at the point we are reaping during the test, the grandchild is very likely to be dead, and that we don’t run into the running-grandchild situation at test time.
    • If that’s the cause of the flake, I’d rather have a more complex test than two invocations of reapStrays in the production code.
  • (Tuning the heuristic of reapStrays. I’m sure there could be a lot of bikeshedding, but ultimately, if the process is not 100% reliable and a matter of engineering tradeoffs, and only relevant in a should-not-happen crash situation, I think the current code is just fine, and not something I think is worth trying to perfect.)

@nalind
Copy link
Member Author

nalind commented Aug 5, 2025

Is this because we might be running in an environment without a working init? Or because some intermediate parent might be setting PR_SET_CHILD_SUBREAPER and deals with unexpected orphans badly?

We had a report that a build running inside of a container was piling up unreaped processes when the runtime crashed, and in those cases pid 1 in the container is us. This fixes that part of it, at least, and the child process was already marking itself as a reaper, so in a sense it had already volunteered to take care of this.

(And if this is needed, should Podman have something similar?)

Podman's going through conmon, which I think already handles this, but we don't use conmon.

@mtrmac
Copy link
Contributor

mtrmac commented Aug 5, 2025

We had a report that a build running inside of a container … in those cases pid 1 in the container is us.

Thanks! LGTM.

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@flouthoc
Copy link
Collaborator

flouthoc commented Aug 7, 2025

/lgtm
/approve

Copy link
Contributor

openshift-ci bot commented Aug 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, nalind

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit f753f46 into containers:main Aug 7, 2025
37 checks passed
@nalind nalind deleted the reap branch August 7, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved kind/bug Categorizes issue or PR as related to a bug. lgtm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants