Skip to content

Conversation

ScrapCodes
Copy link
Contributor

@ScrapCodes ScrapCodes commented Aug 3, 2021

Changes

  1. Add field Retries to RunSpec, an integer count which acts as a FYI to
    custom task controller.
  2. Add a new RunRetry, in addition to RunCancelled status to RunSpecStatus
    i.e. v1alpha1.RunSpecStatusRetry
  3. Add a field RetriesStatus to RunStatusFields, to maintain the retry
    history for a Run, similar to v1beta1.TaskRunStatusFields.RetriesStatus
  4. Add a config map entry (default-short-timeout-seconds) to config-defaults in
    order to make short timeout configurable.

Proposed algorithm for performing a retry for custom task.

  • Step 1. A pipelineTask consisting of a custom task X, is configured with
    retries count.

  • Step 2. On failure of task X, pipelinerun controller sees a request for a
    retry. It then communicates the same to custom task Run by patching
    /spec/status with a v1alpha1.RunSpecStatusRetry i.e. RunRetry. Similar
    to request a custom task to cancel.

  • Step 3. In addition to patching the pipelinerun controller also enqueue a timer
    time.After(default-short-timeout-seconds) (default: 30 seconds).
    On completion of timeout (i.e. 30s), it checks if /spec/status is RunRetry,
    then it assumes that custom task does not support retry.

    • a) if custom task does not supports retry as above, It sets no. of retry done
      to the retries count configured - i.e. exhaust all retries.
    • b) if custom task does support retry, update retry history.
  • Step 4. The custom task that wants to support the retry, has to update

    • a) status.conditions to indicate it is Running.
    • b) clear /spec/status if it is RunRetry.

A task may retry and immediately fail, so controller cannot fully rely on
status.conditions.

Docs changes

  1. Removed the limitation that retries are not supported for custom task.
  2. Add a short user guide for custom task developer.

/kind feature

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Docs included if any changes are user facing
  • Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including
    functionality, content, code)
  • Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

Custom task can be configured with retry. e.g.

            spec:
              pipelineSpec:
                tasks:
                  - name: task-loop-with-retry
                    retries: 2 # retries parameter of a PipelineTask in Pipeline definition.
                    runAfter:
                      - first-task
                    params:
                      - name: word
                        value:
                          - jump
                          - land
                          - roll
                      - name: suffix
                        value: ing
                    taskRef:
                      apiVersion: custom.tekton.dev/v1alpha1
                      kind: TaskLoop
                      name: simpletaskloop

 1. Add field `Retries` to `RunSpec`, an integer count which acts as a FYI to
    custom task controller.
 2. Add a new `RunRetry`, in addition to `RunCancelled` status to `RunSpecStatus`
    i.e. `v1alpha1.RunSpecStatusRetry`
 3. Add a field `RetriesStatus` to `RunStatusFields`, to maintain the retry
    history for a `Run`, similar to `v1beta1.TaskRunStatusFields.RetriesStatus`
 4. Add a config map entry (default-short-timeout-seconds) to `config-defaults` in
order to make short timeout configurable.

 Proposed algorithm for performing a retry for custom task.

 - Step 1. A `pipelineTask` consisting of a custom task X, is configured with
   `retries` count.

 - Step 2. On failure of task X, `pipelinerun` controller sees a request for a
   retry. It then communicates the same to custom task `Run` by patching
   `/spec/status` with a `v1alpha1.RunSpecStatusRetry` i.e. `RunRetry`. Similar
   to request a custom task to cancel.

 - Step 3. In addition to patching the `pipelinerun` controller also enqueue a timer
   `time.After(default-short-timeout-seconds)` (default: 30 seconds).
   On completion of timeout (i.e. 30s), it checks if `/spec/status` is `RunRetry`,
   then it assumes that custom task does not support retry.
     - a) if custom task does not supports retry as above, It sets no. of `retry done`
     to the `retries` count configured - i.e. exhaust all retries.
     - b) if custom task does support retry, update retry history.

 - Step 4. The custom task that wants to support the retry, has to update
   - a) `status.conditions` to indicate it is `Running`.
   - b) clear `/spec/status` if it is `RunRetry`.

 _A task may retry and immediately fail, so controller cannot fully rely on
 `status.conditions`._

 Docs changes
 1. Removed the limitation that retries are not supported for custom task.
 2. Add a short user guide for custom task developer.
@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Aug 3, 2021
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign imjasonh
You can assign the PR to them by writing /assign @imjasonh in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 3, 2021
@ScrapCodes ScrapCodes marked this pull request as draft August 3, 2021 13:46
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/default.go 82.8% 73.5% -9.2
pkg/apis/pipeline/v1alpha1/run_types.go 43.8% 47.1% 3.3
pkg/apis/pipeline/v1beta1/pipeline_types.go 95.9% 95.8% -0.1
pkg/reconciler/pipelinerun/pipelinerun.go 83.4% 83.8% 0.4
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 92.2% 92.3% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 73.4% 73.5% 0.0

@ScrapCodes
Copy link
Contributor Author

Currently, I am using time.AfterFunc in this PR. Which is not so ideal performance wise, once the PR #4131 is merged, I will be able to switch to NewRequeueKey .

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 7, 2021
@tekton-robot
Copy link
Collaborator

@ScrapCodes: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ScrapCodes ScrapCodes closed this Sep 16, 2021
@afrittoli
Copy link
Member

Thanks @ScrapCodes - I look forward to the new PR then!

@ScrapCodes
Copy link
Contributor Author

Yeah @afrittoli, I have proposed a new approach see tektoncd/community#491

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants