Skip to content

Conversation

@jerop
Copy link
Member

@jerop jerop commented Aug 3, 2021

Changes

Finally Tasks are executed only when all Tasks are Done, where
Done means a Task is Skipped, Succeeded or Failed. But a
Task is considered Failed only when its Retries are all done.

Today, when a Task with Retries is Cancelled, it's not considered
Failed if its Retries are not yet done. This is problematic because
if a PipelineRun is Gracefully Cancelled, where it should run
Finally Tasks, and it has a running Task with Retries:

  • the Task with Retries will be cancelled, but won't be considered
    Failed because its Retries are not done
  • this would block the execution of Finally Tasks
  • the PipelineRun would hang (stay running) until it times out

In this change, we modify Is_Failure to consider that either all
Retries are done or Reason for failure is Cancellation. With this
change, we can unblock execution of the Finally Tasks during Graceful
Termination
.

In addition, the documentation notes that "Retries - Specifies the
number of times to retry the execution of a Task after a failure.
Does not apply to execution cancellations."

Fixes #4125

Related:

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Docs included if any changes are user facing
  • Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including
    functionality, content, code)
  • Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

graceful termination of `pipelineruns` with running `taskruns` with `retries` will be observed, because cancelled `taskruns` with remaining `retries` will be considered `failed`

`Finally Tasks` are executed only when all `Tasks` are _Done_, where
_Done_ means a `Task` is _Skipped_, _Succeeded_ or _Failed_. But a
`Task` is considered _Failed_ only when its `Retries` are all done.

Today, when a `Task` with `Retries` is `Cancelled`, it's not considered
_Failed_ if its `Retries` are not yet done. This is problematic because
if a `PipelineRun` is _Gracefully Cancelled_, where it should run
`Finally Tasks`, and it has a running `Task` with `Retries`:
- the `Task` with `Retries` be cancelled, but won't be considered
_Failed_ because its `Retries` are not done
- this would block the execution of `Finally Tasks`
- the `PipelineRun` would hang (stay running) until it times out

In this change, we modify `Is_Failure` to consider that either all
`Retries` are done or `Reason` for failure is `Cancellation`. With this
change, we can unblock execution of the `Finally Tasks` during _Graceful
Termination_.

In addition, the documentation notes that "`Retries` - Specifies the
number of times to retry the execution of a Task after a failure.
_Does not apply to execution cancellations_."

Fixes #4125
@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 3, 2021
@tekton-robot tekton-robot requested review from a user and vdemeester August 3, 2021 17:53
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 3, 2021
@jerop jerop marked this pull request as ready for review August 3, 2021 17:54
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 3, 2021
@jerop jerop added this to the Pipelines v0.27 milestone Aug 3, 2021
@jerop
Copy link
Member Author

jerop commented Aug 3, 2021

@tektoncd/core-maintainers added this bug fix to the milestone for 0.27, please take a look :)


isDone := facts.checkTasksDone(d)
if d := cmp.Diff(isDone, tc.expected); d != "" {
if d := cmp.Diff(tc.expected, isDone); d != "" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice catch

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 3, 2021
@jerop
Copy link
Member Author

jerop commented Aug 3, 2021

/test pull-tekton-pipeline-integration-tests

1 similar comment
@jerop
Copy link
Member Author

jerop commented Aug 3, 2021

/test pull-tekton-pipeline-integration-tests

@jerop jerop changed the title Cancelling a Task with Retries - recognize as Failure (for Graceful Termination) Graceful Termination: recognize _cancelled_ task with remaining retries as _failed_ Aug 3, 2021
@jerop jerop changed the title Graceful Termination: recognize _cancelled_ task with remaining retries as _failed_ Graceful Termination: recognize *cancelled* task with remaining retries as *failed* Aug 3, 2021
@jerop jerop changed the title Graceful Termination: recognize *cancelled* task with remaining retries as *failed* Graceful Termination: recognize cancelled task with remaining retries as failed Aug 3, 2021
@jerop jerop changed the title Graceful Termination: recognize cancelled task with remaining retries as failed Graceful Termination: cancelled task is failed (even with remaining retries) Aug 3, 2021
@jerop jerop changed the title Graceful Termination: cancelled task is failed (even with remaining retries) Graceful Termination: cancelled task is failed (even with retries) Aug 3, 2021
@jerop jerop changed the title Graceful Termination: cancelled task is failed (even with retries) Graceful Termination: Cancelled Task is Failed (even with Retries) Aug 3, 2021
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 4, 2021
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbwsg, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot merged commit be3a9ff into tektoncd:main Aug 4, 2021
@jerop jerop deleted the finally-with-dag-retries branch June 11, 2022 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PipelineRun keep in CancelledRunningFinally status and finally task not be executed when updating running pipelineRun spec status to CancelledRunFinally

3 participants