Skip to content

Improve droplet upload draining #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

kathap
Copy link
Contributor

@kathap kathap commented Jul 18, 2025

  • In main.go signal handler:
    • Wait up to the timeout for in‑flight uploads to complete by racing a uploadWaitGroup.Wait() against timeout.
    • Finally, signal the Ifrit process group (via monitor.Signal(os.Interrupt)) to hard‑stop any remaining runners and exit.

  • In upload_droplet.ServeHTTP:
    • Retain existing logging and WaitGroup tracking for in-flight requests.
    • replace deprecated CloseNotifier with context-based cancellation:
    • Enhance per-request flow, introduce context.WithTimeout(r.Context(), timeout) to produce a single ctx.Done() channel that:
    • Cancels when the client disconnects (r.Context()).
    • Cancels when the configured timeout elapses.

This ensures that ongoing droplet uploads finish cleanly while preventing new requests during shutdown.

Related PRs:
cloudfoundry/capi-release#562

Related issue:
cloudfoundry/cloud_controller_ng#4351

- Introduce a global `draining` flag in `main.go` (atomic int32) to signal shutdown mode.
- In `main.go` signal handler:
  • Set `draining = 1` on SIGTERM so new uploads are refused.
  • Invoke Ifrit group shutdown, then call `Shutdown()` on both HTTP servers with a 300s timeout.
  • Wait on `uploadWaitGroup` to let all in-flight uploads and polls complete before exit.
- In `upload_droplet.ServeHTTP`:
  • Early-return HTTP 503 if `draining==1` (`rejecting-request-draining` log).
  • Retain existing logging and WaitGroup tracking for in-flight requests.
- Enhance per-request flow to use `context.WithTimeout(r.Context(), timeout)`:
  • Automatically abort upload and polling if the client disconnects or the timeout elapses.
  • Pass `ctx.Done()` into both upload and poll routines.

This ensures that ongoing droplet uploads finish cleanly while preventing new requests during shutdown.
@kathap kathap force-pushed the improve-droplet-upload-draining branch from 2d87d5d to f87cac1 Compare July 21, 2025 07:15
@kathap kathap force-pushed the improve-droplet-upload-draining branch from 07c903e to b5eddc1 Compare July 21, 2025 13:06
@kathap kathap marked this pull request as ready for review July 24, 2025 07:05
@kathap kathap marked this pull request as draft July 29, 2025 14:54
@kathap kathap marked this pull request as ready for review July 30, 2025 14:21
Copy link

@jochenehret jochenehret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but I still had some questions...

@kathap kathap marked this pull request as draft August 14, 2025 08:01
@kathap kathap marked this pull request as ready for review August 15, 2025 09:39
@kathap kathap force-pushed the improve-droplet-upload-draining branch from fceb7cf to a5b111a Compare August 15, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants