-
Notifications
You must be signed in to change notification settings - Fork 1k
fix: abort scenario where canary/stable service is not provided #4299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Peter Jiang <[email protected]>
Signed-off-by: Peter Jiang <[email protected]>
Published E2E Test Results 4 files 4 suites 3h 16m 18s ⏱️ For more details on these failures, see this check. Results for commit 515799e. ♻️ This comment has been updated with latest results. |
Published Unit Test Results2 311 tests 2 311 ✅ 3m 1s ⏱️ Results for commit 515799e. ♻️ This comment has been updated with latest results. |
Signed-off-by: Peter Jiang <[email protected]>
Signed-off-by: Peter Jiang <[email protected]>
Signed-off-by: Peter Jiang <[email protected]>
Signed-off-by: Peter Jiang <[email protected]>
Signed-off-by: Peter Jiang <[email protected]>
|
* fix: fix abort scenario where canary/stable service is not provided Signed-off-by: Peter Jiang <[email protected]> * remove not needed code Signed-off-by: Peter Jiang <[email protected]> * Modularize tests Signed-off-by: Peter Jiang <[email protected]> * fix tests Signed-off-by: Peter Jiang <[email protected]> * Refactor code to handle only checking relavant RS and return nil Signed-off-by: Peter Jiang <[email protected]> * update test Signed-off-by: Peter Jiang <[email protected]> * update test Signed-off-by: Peter Jiang <[email protected]> --------- Signed-off-by: Peter Jiang <[email protected]>
…proj#4299) * fix: fix abort scenario where canary/stable service is not provided Signed-off-by: Peter Jiang <[email protected]> * remove not needed code Signed-off-by: Peter Jiang <[email protected]> * Modularize tests Signed-off-by: Peter Jiang <[email protected]> * fix tests Signed-off-by: Peter Jiang <[email protected]> * Refactor code to handle only checking relavant RS and return nil Signed-off-by: Peter Jiang <[email protected]> * update test Signed-off-by: Peter Jiang <[email protected]> * update test Signed-off-by: Peter Jiang <[email protected]> --------- Signed-off-by: Peter Jiang <[email protected]> Signed-off-by: heshamelsherif97 <[email protected]>
… canary on istio but do start scaling down old canaries RS
… canary on istio but do start scaling down old canaries RS Signed-off-by: Niko Kurtti <[email protected]>
… canary on istio but do start scaling down old canaries RS Signed-off-by: Niko Kurtti <[email protected]>
…nsitions when stable RS not ready Fixes issue argoproj#4390 where traffic routing rules could get out of sync with scaling operations during canary deployment transitions, causing 503/UH errors. When a new canary deployment starts while stable ReplicaSet is not fully available, the UpdateHash method now properly returns an error to prevent the DestinationRule update, ensuring old canary ReplicaSets are not scaled down while still receiving traffic. Key changes: - Enhanced UpdateHash logic to distinguish between no-services and services-defined scenarios - For rollouts with defined services: only block on stable RS during normal transitions - For rollouts without services: preserve original argoproj#2507 behavior - For abort scenarios: allow updates to proceed to avoid deadlock (argoproj#4299) - Added comprehensive test coverage for all scenarios Fixes argoproj#4390 Preserves fixes for argoproj#2507 and argoproj#4299 Signed-off-by: Niko Kurtti <[email protected]>
…nsitions when stable RS not ready Fixes issue argoproj#4390 where traffic routing rules could get out of sync with scaling operations during canary deployment transitions, causing 503/UH errors. When a new canary deployment starts while stable ReplicaSet is not fully available, the UpdateHash method now properly returns an error to prevent the DestinationRule update, ensuring old canary ReplicaSets are not scaled down while still receiving traffic. Key changes: - Enhanced UpdateHash logic to distinguish between no-services and services-defined scenarios - For rollouts with defined services: only block on stable RS during normal transitions - For rollouts without services: preserve original argoproj#2507 behavior - For abort scenarios: allow updates to proceed to avoid deadlock (argoproj#4299) - Added comprehensive test coverage for all scenarios Fixes argoproj#4390 Preserves fixes for argoproj#2507 and argoproj#4299 Signed-off-by: Niko Kurtti <[email protected]>
… canary on istio but do start scaling down old canaries RS
…proj#4299) * fix: fix abort scenario where canary/stable service is not provided Signed-off-by: Peter Jiang <[email protected]> * remove not needed code Signed-off-by: Peter Jiang <[email protected]> * Modularize tests Signed-off-by: Peter Jiang <[email protected]> * fix tests Signed-off-by: Peter Jiang <[email protected]> * Refactor code to handle only checking relavant RS and return nil Signed-off-by: Peter Jiang <[email protected]> * update test Signed-off-by: Peter Jiang <[email protected]> * update test Signed-off-by: Peter Jiang <[email protected]> --------- Signed-off-by: Peter Jiang <[email protected]> Signed-off-by: phraajes <[email protected]>
fixes #4128
When introducing ping-pong support for Istio https://github.com/argoproj/argo-rollouts/pull/3371/files this PR removed validation for canary/stable service for rollouts when using DestinationRule. This caused a deadlock scenario when aborting a rollout that has a replica not ready:
"delaying destination rule switch: ReplicaSet my-srvc-64b664b6f not fully available"
With this fix:
This will fix both scenarios for manual abort or progressDeadlineAbort for rollouts.
Checklist:
"fix(controller): Updates such and such. Fixes #1234"
.