How to combine RetryPolicy.ON_ERROR with a specific RetryPolicy.ON_FAILURE + expression using OR logic? #1377
Unanswered
xiki-tempula
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Hera maintainers and community,
I'm working on defining an Argo Workflow using Hera for a task that requires GPU resources. I want to implement a sophisticated retry strategy, but I'm unsure how to combine two different conditions with OR logic.
Goal:
I want my task to retry if either of the following conditions is met:
The pod fails with a system/node error (covered by
RetryPolicy.ON_ERROR
). This should cover cases like the node dying, preemption, or potentially some scheduling issues before the container runs properly.OR
The container itself fails (covered by
RetryPolicy.ON_FAILURE
) AND the failure message specifically contains the string"Allocate failed due to no healthy devices present"
.What I Know:
I know how to create a RetryStrategy for each case individually using Hera:
Case 1: Retry on any OnError:
Case 2: Retry on OnFailure only if message matches:
The Challenge:
I need the retry to trigger if the conditions for retry_on_any_error are met OR if the conditions for retry_on_specific_failure are met.
I looked through the RetryStrategy parameters and the Argo documentation, but I don't see an obvious way to express this OR condition between a general policy (OnError) and a specific filtered policy (OnFailure + expression). Setting retry_policy seems to take only one enum value, and the expression seems to filter within the chosen policy, not combine different policies.
Question:
Is it possible to configure a single RetryStrategy in Hera (and Argo) that achieves this combined "OR" logic? If so, how would I define it?
If it's not directly possible with a single strategy, are there recommended patterns or workarounds to achieve this behavior?
Thanks in advance for any help or guidance!
Beta Was this translation helpful? Give feedback.
All reactions