-
Couldn't load subscription status.
- Fork 180
Extract goroutine management and clean up into a separate package #1362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # packages/orchestrator/internal/sandbox/network/pool.go
| tasks = append(tasks, supervisor.Task{ | ||
| Name: "template manager drain", | ||
| Cleanup: tmpl.Wait, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Reverse Cleanup Order Causes Shutdown Delays
The supervisor package runs cleanup tasks in reverse order of addition, which inadvertently reverses the intended shutdown sequence. The template manager drain now executes before the orchestrator service is marked as draining and its propagation delay. This also delays the service's draining status change until the cleanup phase, potentially routing requests to the instance longer than intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional, the orchestrator and template manager are not actually related to each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to happen AFTER the orchestrator draining status
| cancelCloseCtx() | ||
| } | ||
| tasks = append(tasks, supervisor.Task{ | ||
| Name: "orchestrator drain", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to happen as the first action when shutting down the orchestrator or the template manager
| tasks = append(tasks, supervisor.Task{ | ||
| Name: "template manager drain", | ||
| Cleanup: tmpl.Wait, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to happen AFTER the orchestrator draining status
| err = supervisor.Run(ctx, supervisor.Options{ | ||
| ForceStop: config.ForceStop, | ||
| Tasks: tasks, | ||
| Logger: globalLogger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit strange, the logger is closed by the supervisor, but it's also used there
| ctx, cancel := context.WithCancel(ctx) | ||
| defer cancel() | ||
|
|
||
| startTask := func(ctx context.Context, task Task) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this function be extracted outside of the body?
|
|
||
| type Options struct { | ||
| ForceStop bool | ||
| Tasks []Task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a []*Task to prevent copy of the Task functions? (but maybe it's ok, not sure now)
| taskLogger := baseLogger.Named(task.Name) | ||
| if task.Cleanup != nil { | ||
| taskLogger.Info("running cleanup") | ||
| if err := task.Cleanup(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the close context should be different from the background tasks context
This allows us to test and standardize how we start, monitor, cancel, and clean up our goroutines. This PR only adds it to orchestrator, but we should add this to api, client-proxy, and the rest as well.
Note
Adds a shared supervisor to manage background tasks and cleanup, and migrates orchestrator to use it instead of custom errgroup/signal/closer logic.
packages/shared/pkg/supervisor: New task supervisor withTask,Options, andRunto start background tasks, handle OS signals, cancel via context, and run reverse-order cleanups; includes tests.packages/orchestrator/main.go: Replaceerrgroup/signal/closerscaffolding with a[]supervisor.Taskfor telemetry, loggers (withcleanupLogger), feature flags, limiter, Redis/pubsub, sandbox observer, proxy, NBD device pool, network pool, hyperloop server, cmux, HTTP, and gRPC servers, plus drain steps; invokesupervisor.Runfor lifecycle and error handling.Written by Cursor Bugbot for commit f09069b. This will update automatically on new commits. Configure here.