-
Notifications
You must be signed in to change notification settings - Fork 237
TEP-0090: Matrix - Failure Strategies #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @abayer @dibyom @lbernick @pritidesai please take a look |
created from a `Matrix` (up to 256 for now). Moreover, failing fast is the default behavior of `Matrix` | ||
in other Continuous Delivery systems - see [GitHub Actions - Handling Failures in Matrix][ghm-failfast]. | ||
|
||
If needed, we can explore supporting other failure strategies later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the right call - in Jenkins Pipeline, we inherited certain assumptions around failure behavior for parallel executions (both in a matrix and not) which resulted in us having to support making fail-fast configurable, and I don't think it was worth doing that in the end. When we're starting from scratch, like here, we have the opportunity to say "this is how things work" and adjust if sufficient demand comes up in the future, rather than prematurely optimizing. So, yeah. 👍
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abayer, dibyom, lbernick The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
In failure scenarios, the `TaskRuns` or `Runs` created from a `Matrix` will fail fast. That is, when | ||
any `TaskRun` or `Run` from a given fanned-out `PipelineTask` fails or is cancelled then the other | ||
`TaskRuns` or `Runs` from the same `PipelineTask` will be cancelled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jerop, looking for more clarification, how are we failing already running taskRuns
or runs
from the same pipelineTask
? Sending termination signal?
Can we implement something like stopping
mode by default, the similar approach we take when a task in a pipeline fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason I am asking is I am assuming stopping
is easier to implement 🤣 And does not result in partial execution ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritidesai yes, similarly to how we cancel TaskRuns
for example - it may be easier to implement stopping
mode but wondering if that's the behavior we want for Matrix
separate from the implementation - maybe partial execution could be the reason not to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stopping
will be consistent to how we address failure
of a non-matrix pipelineTask
.
Just an example, if building an image of one application fails, I do not want that failure to stop building images of 20 other applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense - added this to the API WG agenda on Monday so we can decide on the best way forward
/assign @abayer @lbernick @pritidesai @dibyom |
[TEP-0090: Matrix][tep-0090] proposed executing a `PipelineTask` in parallel `TaskRuns` and `Runs` with substitutions from combinations of `Parameters` in a `Matrix`. In this change, we propose a fail-fast stategy for handling failures in `TaskRuns` and `Runs` created from a `Matrix`. This makes it easier to control the execution of the many `TaskRuns` or `Runs` created from a `Matrix` - up to 256. This approach also aligns with the default behavior of `Matrix` in other Continuous Delivery systems e.g [GitHub Actions][ghm-failfast]. If needed, we can explore supporting other failure strategies later. [tep-0090]: https://github.com/tektoncd/community/blob/main/teps/0090-matrix.md [ghm-failfast]: https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs#handling-failures
Discussed at the API WG on 06/13 and we agreed on the following:
/close |
@jerop: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
In #4951, we implemented `isFailure` for matrixed `TaskRuns` where we applied fail-fast failure strategy. We discussed failure strategies further in this PR - tektoncd/community#724 - and API WG on 13 June 2022. We agreed to leave early termination upon failure out of scope for the initial release of Matrix. We plan to explore failure strategies, including fail-fast, in future work. And these failure strategies may apply more broadly beyond Matrix. In this change, we update `isFailure` to evaluate to `true` only when there's a failure and there are no running `TaskRuns` in the `rprt`.
TEP-0090: Matrix proposed executing a
PipelineTask
inparallel
TaskRuns
andRuns
with substitutions from combinationsof
Parameters
in aMatrix
.In this change, we propose a fail-fast stategy for handling failures
in
TaskRuns
andRuns
created from aMatrix
. This makes it easierto control the execution of the many
TaskRuns
orRuns
created froma
Matrix
- up to 256.This approach also aligns with the default behavior of
Matrix
inother Continuous Delivery systems e.g GitHub Actions.
If needed, we can explore supporting other failure strategies later.
/kind tep