-
Notifications
You must be signed in to change notification settings - Fork 2k
Move ECS worker to event-based crash detection #18804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add tests for mark_runs_as_crashed function covering: * Non-zero exit code scenarios that trigger crashed state * Zero exit code scenarios that don't trigger crashed state * None exit code scenarios that trigger crashed state * Missing task ARN early exit * Flow run not found error handling * Final and scheduled state skipping - Add tests for deregister_task_definition function covering: * Successful task definition deregistration * Missing task definition ARN early exit * Empty detail section handling - Add tests for LastStatusFilter class covering: * No filter statuses (matches all) * Single status filtering * Multiple status filtering * All valid ECS task statuses * Final states and intermediate states groupings 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment. |
This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment. |
def _observer_task_done(task: asyncio.Task[None]): | ||
if task.cancelled(): | ||
logger.debug("ECS observer task cancelled") | ||
elif task.exception(): | ||
logger.error("ECS observer task crashed", exc_info=task.exception()) | ||
else: | ||
logger.debug("ECS observer task completed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to ensure the observer is restarted on exception. Otherwise messages backup in SQS.
This PR moves the ECS worker from a polling-based design to an event-based design. With these changes, the ECS worker will submit ECS tasks, and the ECS observer will track the execution of those ECS tasks by receiving ECS task state change events from EventBridge via SQS.
This new design will require additional setup, but it will continue to function in a limited capacity if the configured SQS queue is unavailable. The worker will log a warning if SQS is not available.
Closes #18508