Run: reap stray processes #6307

nalind · 2025-07-31T22:21:16Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

When handling buildah run or a RUN instruction with an external runtime, after we've picked up the exit status of the "main" process that we're running, wait() for anything that was reparented to us before returning.

How to verify it

New integration test!

Which issue(s) this PR fixes:

Special notes for your reviewer:

This doesn't fix "the runtime crashes" cases, but it fixes "there are zombie processes left over when the runtime crashes".

Does this PR introduce a user-facing change?

None

packit-as-a-service · 2025-07-31T22:30:02Z

Ephemeral COPR build failed. @containers/packit-build please check.

run: after we've picked up the exit status of the "main" process that we're running, reap anything that was reparented to us before returning. Signed-off-by: Nalin Dahyabhai <[email protected]>

mtrmac · 2025-08-04T21:37:04Z

run_common.go

+	// we care about their exit status.
+	logrus.Debugf("checking for reparented child processes")
+	for range 100 {
+		wpid, err := unix.Wait4(-1, nil, unix.WNOHANG, nil)


Is this safe to do in about a process containing a ~million lines of Go code? I’m worried that it might, now or in the future, interfere with some library’s private helpers.

Is that prevented somehow?

(My first thought is that the parent that anything is reparented to should be a single-purpose small process … but that process could just exit and have everything reparented to the true PID 1, without individually waiting, couldn’t it? It’s very possible I’m missing something.)

This should only be getting called in a subprocess that we've spun off to babysit the runtime, and which has set itself as a child subreaper.

mtrmac · 2025-08-04T21:37:44Z

run_common.go

+			break
+		}
+		if wpid == 0 {
+			time.Sleep(100 * time.Millisecond)


Doesn’t this mean that the default execution of reapStrays will wait for 100*100 ms = 10 whole seconds??

If there are no child processes (the most likely case), the WNOHANG should tell Wait4() to return immediately with an ECHILD error, and we'll break out of the loop.

I’m sorry, my mistake.

mtrmac

Implementation LGTM, but I’m not sure why this is beneficial.

If the intermediate process exits without waiting, I read the kernel to mean the zombies will get reparented to init (or to some intermediate reaper), and that should generally be fine.

Is this because we might be running in an environment without a working init? Or because some intermediate parent might be setting PR_SET_CHILD_SUBREAPER and deals with unexpected orphans badly?

(And if this is needed, should Podman have something similar?)

mtrmac · 2025-08-05T16:23:16Z

run_common.go

@@ -1129,6 +1147,7 @@ func runUsingRuntimeMain() {

 	// Run the container, start to finish.
 	status, err := runUsingRuntime(options.Options, options.ConfigureNetwork, options.MoreCreateArgs, ospec, options.BundlePath, options.ContainerName, containerCreateW, containerStartR)
+	reapStrays()


runUsingRuntime just did this; does doing it again make a difference?

FWIW, the new test was failing pretty regularly without one call or the other.

Hmm, it's possible I'm misremembering this.

… Is it possible for the “runtime” to exit immediately, while its children continue to be running? If so, the children could be running for an ~unbounded amount of time (or until we somehow enumerate + kill them I suppose), and there is no perfect time to run the reaping. (But, in production, if we run several build steps, we would reap them eventually).

If the "kill" invocation crashes and there's a process in the container that's left running, that's a definite possibility. In that case we'd hit the 10 second timeout (uh... twice) and give up.

(I LGTMed the PR already, so none of this is blocking)

I think there are ~three separate concerns here:

Can this cleanup race vs. termination (or non-termination) of the indirect children, and if so, what to do in production?

As you say, yes, this could happen if the runtime’s kill fails — but I think this is not worth worrying too much about: if the runtime is crashing, something like reimplementing the runtime as a fallback is, to me, not attractive at all: A lot of complex code that is very unlikely to run (and that might be crashing for the same reason the otherwise-working runtime is crashing?) = a lot of extra risk and very little added value.

Assuming this cleanup can lose a race, does that mean we are adding a flaky test? That might be avoidable, and worth avoiding.

Maybe the crash command could (run /bin/true or something similar, lighter-weight than a shell, and) sleep for a few ms (or, uh, poll on /proc/$grandchildPid? probably not), to have the grandchild exit before crash kills itself, to ~ensure that at the point we are reaping during the test, the grandchild is very likely to be dead, and that we don’t run into the running-grandchild situation at test time.

If that’s the cause of the flake, I’d rather have a more complex test than two invocations of reapStrays in the production code.

(Tuning the heuristic of reapStrays. I’m sure there could be a lot of bikeshedding, but ultimately, if the process is not 100% reliable and a matter of engineering tradeoffs, and only relevant in a should-not-happen crash situation, I think the current code is just fine, and not something I think is worth trying to perfect.)

nalind · 2025-08-05T16:50:38Z

Is this because we might be running in an environment without a working init? Or because some intermediate parent might be setting PR_SET_CHILD_SUBREAPER and deals with unexpected orphans badly?

We had a report that a build running inside of a container was piling up unreaped processes when the runtime crashed, and in those cases pid 1 in the container is us. This fixes that part of it, at least, and the child process was already marking itself as a reaper, so in a sense it had already volunteered to take care of this.

(And if this is needed, should Podman have something similar?)

Podman's going through conmon, which I think already handles this, but we don't use conmon.

mtrmac · 2025-08-05T17:02:32Z

We had a report that a build running inside of a container … in those cases pid 1 in the container is us.

Thanks! LGTM.

flouthoc

LGTM

flouthoc · 2025-08-07T15:04:04Z

/lgtm
/approve

openshift-ci · 2025-08-07T15:04:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, nalind

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [flouthoc,nalind]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added kind/bug Categorizes issue or PR as related to a bug. approved labels Jul 31, 2025

nalind force-pushed the reap branch from 2cede4d to ee76085 Compare July 31, 2025 22:29

Reap stray processes

27c40b3

run: after we've picked up the exit status of the "main" process that we're running, reap anything that was reparented to us before returning. Signed-off-by: Nalin Dahyabhai <[email protected]>

nalind force-pushed the reap branch from ee76085 to 27c40b3 Compare August 1, 2025 17:30

mtrmac reviewed Aug 4, 2025

View reviewed changes

mtrmac reviewed Aug 5, 2025

View reviewed changes

flouthoc approved these changes Aug 7, 2025

View reviewed changes

openshift-ci bot assigned flouthoc Aug 7, 2025

openshift-ci bot added the lgtm label Aug 7, 2025

openshift-merge-bot bot merged commit f753f46 into containers:main Aug 7, 2025
37 checks passed

nalind deleted the reap branch August 7, 2025 18:51

Run: reap stray processes #6307

Run: reap stray processes #6307

Conversation

nalind commented Jul 31, 2025

What type of PR is this?

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

packit-as-a-service bot commented Jul 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mtrmac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mtrmac Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nalind commented Aug 5, 2025

Uh oh!

mtrmac commented Aug 5, 2025

Uh oh!

flouthoc left a comment

Choose a reason for hiding this comment

Uh oh!

flouthoc commented Aug 7, 2025

Uh oh!

openshift-ci bot commented Aug 7, 2025

Uh oh!

Uh oh!

Uh oh!

mtrmac Aug 5, 2025 •

edited

Loading