Skip to content

Conversation

Sovietaced
Copy link
Member

@Sovietaced Sovietaced commented May 1, 2025

Tracking issue

Closes #6435

Why are the changes needed?

Without these changes followers/replicas in highly available deployments will enqueue workflows until they saturate the work queue. Once the work queue is saturated these workflows may become very stale. Once a follower does become a leader it will need to churn through these stale workflows that might not even exist anymore.

What changes were proposed in this pull request?

Added a thread safe bool for whether the propeller is leader and check it before enqueueing workflows.

How was this patch tested?

Not yet but will test in our development environment.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

This PR implements leader-only workflow enqueuing in HA flytepropeller deployments using an atomic boolean flag, preventing followers from adding stale workflows to the queue. It enhances system reliability and consistency during leadership transitions, and refines workflow handling. The changes also improve logging of workflow phases for failed executions to facilitate better monitoring and debugging.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 1

@Sovietaced Sovietaced added the fixed For any bug fixes label May 1, 2025
@flyte-bot
Copy link
Collaborator

flyte-bot commented May 1, 2025

Code Review Agent Run Status

  • Limitations and other issues: ❌ Failure - Bito Automatic Review Skipped - Draft PR

    Bito didn't auto-review because this pull request is in draft status.
    To trigger review, mark the PR as ready or type /review in the comment and save.
    You can change draft PR review settings here, or contact the agent instance creator at [email protected].

@Sovietaced Sovietaced marked this pull request as ready for review May 1, 2025 03:58
Signed-off-by: Jason Parraga <[email protected]>
@Sovietaced Sovietaced changed the title Only enqueue workflows if leader Only enqueue workflows if flytepropeller leader May 1, 2025
Copy link

codecov bot commented May 1, 2025

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 58.48%. Comparing base (beb6422) to head (3f88808).
Report is 27 commits behind head on master.

Files with missing lines Patch % Lines
flytepropeller/pkg/controller/controller.go 0.00% 6 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6436   +/-   ##
=======================================
  Coverage   58.47%   58.48%           
=======================================
  Files         940      940           
  Lines       71579    71585    +6     
=======================================
+ Hits        41858    41864    +6     
  Misses      26538    26538           
  Partials     3183     3183           
Flag Coverage Δ
unittests-datacatalog 59.03% <ø> (ø)
unittests-flyteadmin 56.27% <ø> (+0.02%) ⬆️
unittests-flytecopilot 30.99% <ø> (ø)
unittests-flytectl 64.72% <ø> (ø)
unittests-flyteidl 76.12% <ø> (ø)
unittests-flyteplugins 60.95% <ø> (ø)
unittests-flytepropeller 54.76% <0.00%> (-0.02%) ⬇️
unittests-flytestdlib 64.02% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@flyte-bot
Copy link
Collaborator

flyte-bot commented May 1, 2025

Code Review Agent Run #99bf3d

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: b023908..331298e
    • flytepropeller/pkg/controller/controller.go
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Collaborator

flyte-bot commented May 1, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Testing - Test Debug Logging Enhancement

test_run.py - Added a debug print to log the workflow execution phase, aiding diagnostics during end-to-end tests.

Bug Fix - Leader-based Workflow Enqueuing Fix

controller.go - Introduced an atomic boolean for leader detection; updated the code to ensure that only the leader enqueues workflows, preventing non-leader replicas from saturating the work queue.

@Sovietaced Sovietaced changed the title Only enqueue workflows if flytepropeller leader Only enqueue workflows if flytepropeller is leader May 1, 2025
@Sovietaced
Copy link
Member Author

The end2end test is failing but I've tested this in a development environment without issues hmm.

Signed-off-by: Jason Parraga <[email protected]>
@flyte-bot
Copy link
Collaborator

flyte-bot commented May 1, 2025

Code Review Agent Run #a53844

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 331298e..3f88808
    • boilerplate/flyte/end2end/test_run.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

return
}
key := wf.GetK8sWorkflowID()
if !c.leader.Load() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the leader elector is not configured, it seems like c.leader will never be set to true, and this part will ignore all workflows and never enqueue them


<-backgroundCtx.Done()

logger.Infof(ctx, "Lost leader lease.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set c.leader to False here to prevent it from keep enqueuing workflow while it is not leader anymore?

@Sovietaced
Copy link
Member Author

@machichima this one wasn't ready for review yet since I was trying to work out the build failure. I'll circle back later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed For any bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Flyte Propeller Followers/Replicas Enqueue Workflows And Fill Up Queue
3 participants