Skip to content

Conversation

desertaxle
Copy link
Member

@desertaxle desertaxle commented Aug 27, 2025

This PR moves the ECS worker from a polling-based design to an event-based design. With these changes, the ECS worker will submit ECS tasks, and the ECS observer will track the execution of those ECS tasks by receiving ECS task state change events from EventBridge via SQS.

This new design will require additional setup, but it will continue to function in a limited capacity if the configured SQS queue is unavailable. The worker will log a warning if SQS is not available.

Closes #18508

@desertaxle desertaxle changed the title prefect aws 0.6.0 Move ECS worker to event-based crash detection Aug 27, 2025
@github-actions github-actions bot added enhancement An improvement of an existing feature integrations Related to integrations with other services labels Aug 27, 2025
desertaxle and others added 6 commits August 27, 2025 16:23
- Add tests for mark_runs_as_crashed function covering:
  * Non-zero exit code scenarios that trigger crashed state
  * Zero exit code scenarios that don't trigger crashed state
  * None exit code scenarios that trigger crashed state
  * Missing task ARN early exit
  * Flow run not found error handling
  * Final and scheduled state skipping

- Add tests for deregister_task_definition function covering:
  * Successful task definition deregistration
  * Missing task definition ARN early exit
  * Empty detail section handling

- Add tests for LastStatusFilter class covering:
  * No filter statuses (matches all)
  * Single status filtering
  * Multiple status filtering
  * All valid ECS task statuses
  * Final states and intermediate states groupings

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor

This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment.

Copy link
Contributor

github-actions bot commented Oct 1, 2025

This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment.

Comment on lines +576 to +582
def _observer_task_done(task: asyncio.Task[None]):
if task.cancelled():
logger.debug("ECS observer task cancelled")
elif task.exception():
logger.error("ECS observer task crashed", exc_info=task.exception())
else:
logger.debug("ECS observer task completed")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to ensure the observer is restarted on exception. Otherwise messages backup in SQS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement An improvement of an existing feature integrations Related to integrations with other services

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement event-driven ECS task state observation for ECS worker

1 participant