Skip to content

Conversation

@yaoxiaoqi
Copy link
Member

@yaoxiaoqi yaoxiaoqi commented Feb 18, 2021

Changes

context.pipelineTask.retries exposes the total retries of a pipelineTask.

context.task.retry-count exposes the current number of retry. The reason why the
variable isn't named context.pipelineTask.retry-count is that every task has a retry
count, only pipelineTask has retries.

Rename function ApplyPipelineTaskContext to ApplyPipelineTaskStateContext in
"pipelinerun/apply.go" to distinguish from ApplyPipelineTaskContexts.

related issue: #2725
TEP: https://github.com/tektoncd/community/blob/main/teps/0039-add-variable-retries-and-retrycount.md

/kind feature

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Add variables context.pipelineTask.retries and context.task.retry-count

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Feb 18, 2021
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 18, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 83.8% 83.9% 0.0

@yaoxiaoqi
Copy link
Member Author

yaoxiaoqi commented Feb 18, 2021

/assign @bobcatfish

@yaoxiaoqi
Copy link
Member Author

/assign @sbwsg

@tekton-robot tekton-robot assigned ghost Feb 21, 2021
@ghost
Copy link

ghost commented Feb 22, 2021

This looks pretty ready to me. I have two suggested changes:

  1. Could you add a section to docs/pipelineruns.md that describes usage of the variables and explains that they're only available in the single pipelineTask they refer to?
  2. Could you also add an example PipelineRun that uses retries under examples/v1beta1/pipelineruns?

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 83.8% 83.9% 0.0

@yaoxiaoqi
Copy link
Member Author

yaoxiaoqi commented Mar 4, 2021

  1. Could you add a section to docs/pipelineruns.md that describes usage of the variables and explains that they're only available in the single pipelineTask they refer to?

@sbwsg I added a section in docs/pipelines.md, since I felt hard to find a right place to insert the section in docs/pipelineruns.md. And there are more contexts in docs/pipelines.md like what retries is and how to use variable substitution. Feel free to let me know if you still want to add a section in pipelineruns.md.

@ghost
Copy link

ghost commented Mar 4, 2021

@sbwsg I added a section in docs/pipelines.md, since I felt hard to find a right place to insert the section in docs/pipelineruns.md. And there are more contexts in docs/pipelines.md like what retries is and how to use variable substitution. Feel free to let me know if you still want to add a section in pipelineruns.md.

No that's ok - the docs you've added look good to me!

@yaoxiaoqi yaoxiaoqi force-pushed the add-retry-count branch 2 times, most recently from de961a9 to 07ce64a Compare March 5, 2021 06:47
@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 5, 2021
@ghost
Copy link

ghost commented Mar 5, 2021

Thanks for this @yaoxiaoqi ! I left one more small comment but not a blocker.

@pritidesai pritidesai added this to the Pipelines 0.23 milestone Mar 9, 2021
Base automatically changed from master to main March 11, 2021 15:28
@yaoxiaoqi
Copy link
Member Author

/retest

@yaoxiaoqi
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

@ghost
Copy link

ghost commented Mar 18, 2021

It looks like the cluster is so busy it's taking forever for the workspace-in-sidecar sidecar to start up, resulting in the examples test timing out.

/test pull-tekton-pipeline-integration-tests

@ghost
Copy link

ghost commented Apr 30, 2021

Would another option be to have createTaskRun perform the retry-count substitution?

func (c *Reconciler) createTaskRun(ctx, rprt, pr, storageBase) {
    logger := logging.FromContext(ctx)
    tr, _ := c.taskRunLister.TaskRuns(pr.Namespace).Get(rprt.TaskRunName)
    if tr == nil {
        rprt.PipelineTask = resources.ApplyRetryCount(rprt.PipelineTask, 0)
    } else {
        rprt.PipelineTask = resources.ApplyRetryCount(rprt.PipelineTask, len(tr.Status.RetriesStatus))
    }
    // ...

Edit: I might be misunderstanding something fundamental that means this wouldn't work?

@pritidesai
Copy link
Member

Would another option be to have createTaskRun perform the retry-count substitution?

func (c *Reconciler) createTaskRun(ctx, rprt, pr, storageBase) {
    logger := logging.FromContext(ctx)
    tr, _ := c.taskRunLister.TaskRuns(pr.Namespace).Get(rprt.TaskRunName)
    if tr == nil {
        rprt.PipelineTask = resources.ApplyRetryCount(rprt.PipelineTask, 0)
    } else {
        rprt.PipelineTask = resources.ApplyRetryCount(rprt.PipelineTask, len(tr.Status.RetriesStatus))
    }
    // ...

Edit: I might be misunderstanding something fundamental that means this wouldn't work?

Yeah I don't this so it would work for this reason (I haven't tried so cannot say for sure 😉 ):

In case of a task being retried, createTaskRun is updating the condition in the status, the specification or parameters are not updated.

	tr, _ := c.taskRunLister.TaskRuns(pr.Namespace).Get(rprt.TaskRunName)
	if tr != nil {
		//is a retry
		addRetryHistory(tr)
		clearStatus(tr)
		tr.Status.SetCondition(&apis.Condition{
			Type:   apis.ConditionSucceeded,
			Status: corev1.ConditionUnknown,
		})
		return c.PipelineClientSet.TektonV1beta1().TaskRuns(pr.Namespace).UpdateStatus(ctx, tr, metav1.UpdateOptions{})
	}

@ghost
Copy link

ghost commented May 3, 2021

@yaoxiaoqi what do you think of taking this approach so that all of the retry-related changes remain in the pipelinerun reconciler?

@dibyom dibyom modified the milestones: Pipelines 0.24, Pipelines v0.25 May 4, 2021
@yaoxiaoqi
Copy link
Member Author

yaoxiaoqi commented May 6, 2021

Would another option be to have createTaskRun perform the retry-count substitution?

Hi @sbwsg , things would definitely be much cleaner if we align the replacement behavior of retries and retry-count. But retry-count is a little bit tricky to deal with because it changes (plus 1 ) every time a taskRun is retried. As I mentioned in the TEP, I choose to use $(context.task.retry-count) instead of $(context.pipelineTask.retry-count) because it can replace the variable in the right timing and it works fine.

Actually, I considered your option when designing. But the problem is as pritidesai said even though you updated the parameters in pipelineTask, you can't update the params in the taskRun, since we only replace the params once in this line. When the same taskRun retries, we won't replace the params again.

If we want to update the parameters, we can't just call updateStatus when we find a taskRun needs to retry. We need to update the whole taskRun to ensure the retry-count value is correct. I simply tried to update the whole taskRun, but interestingly the retry-count sometimes is less than the true value, sometimes is equal to the true value. Besides, I'm also afraid that there are some other side-effects. I'd like to know your opinion. And if you think retries and retry-count should be aligned and I will try to dig into the problem and try to figure out a solution.

original code:

tr, _ := c.taskRunLister.TaskRuns(pr.Namespace).Get(rprt.TaskRunName)
if tr != nil {
//is a retry
addRetryHistory(tr)
clearStatus(tr)
tr.Status.SetCondition(&apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
})
return c.PipelineClientSet.TektonV1beta1().TaskRuns(pr.Namespace).UpdateStatus(ctx, tr, metav1.UpdateOptions{})
}

I did something like this:

if tr != nil {
		//is a retry
		addRetryHistory(tr)
		clearStatus(tr)
		tr.Status.SetCondition(&apis.Condition{
			Type:   apis.ConditionSucceeded,
			Status: corev1.ConditionUnknown,
		})
                 rprt.PipelineTask = resources.ApplyPipelineTaskContexts(rprt.PipelineTask, tr)
		tr, _ = c.PipelineClientSet.TektonV1beta1().TaskRuns(pr.Namespace).UpdateStatus(ctx, tr, metav1.UpdateOptions{})
		tr.Spec.Params = rprt.PipelineTask.Params
		return c.PipelineClientSet.TektonV1beta1().TaskRuns(pr.Namespace).Update(ctx, tr, metav1.UpdateOptions{})
}

@yaoxiaoqi
Copy link
Member Author

  1. Replace retry-count with the number of attempts. Retry means to try executing again and does not make sense to initialize the counter while the task is executed the first time. After it fails once, the task is retried and hence the counter is set to 1 while getting executed second time. By replacing retry-count with attempt, we can have it implemented as part of the pipeline controller instead of taskrun controller.

Hi @pritidesai , I would like to know where I can get the attempt of a TaskRun? I didn't find it in the taskRun structure.

@ghost
Copy link

ghost commented May 6, 2021

I simply tried to update the whole taskRun, but interestingly the retry-count sometimes is less than the true value, sometimes is equal to the true value.

Ahhhh ok, I didn't understand that this was happening. It's really interesting! I hadn't seen this behaviour before but I can confirm this happens in my testing as well: I added a little bit of code to do the replacement in the PipelineRun reconciler and update TaskRun's Spec.Params. I sometimes get the desired number of retries but I also sometimes just get a constant "1". This is really strange.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2021
@ghost
Copy link

ghost commented May 7, 2021

OK, I am happy to go ahead with the context.task.retry-count being replaced in the taskrun reconciler unless there are any objections from @bobcatfish or @pritidesai

There's one remaining nit I added to move the pipelineTask variable up to the pipelines section of docs/variables.md but otherwise this looks good to me.

/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbwsg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2021
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 13, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 83.8% 83.9% 0.0

…y-count`

`context.pipelineTask.retries` exposes the total retries of a pipelineTask.

`context.task.retry-count` exposes the current number of retry. The reason why the
variable isn't `context.pipelineTask.retry-count` is that every task has a retry
count, only pipelineTask has retries.

Rename function `ApplyPipelineTaskContext` to `ApplyPipelineTaskStateContext` in
"pipelinerun/apply.go" to distinguish from `ApplyPipelineTaskContexts`.
@yaoxiaoqi
Copy link
Member Author

There's one remaining nit I added to move the pipelineTask variable up to the pipelines section of docs/variables.md but otherwise this looks good to me.

Done

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 83.8% 83.9% 0.0

@yaoxiaoqi
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

@yaoxiaoqi yaoxiaoqi requested review from a user and bobcatfish May 14, 2021 08:44
@pritidesai
Copy link
Member

sorry @yaoxiaoqi haven't been able to review this since the latest updates, will review it later in the week 🙏

return c.PipelineClientSet.TektonV1beta1().TaskRuns(pr.Namespace).UpdateStatus(ctx, tr, metav1.UpdateOptions{})
}

rprt.PipelineTask = resources.ApplyPipelineTaskContexts(rprt.PipelineTask)
Copy link
Member

@pritidesai pritidesai Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to merge this as is, but setting the context variable here will only make it available to pipelineTask and will not be able to custom task. Since the retries is constant and does not change with the pipeline execution, we can set this while resolving a pipeline task.

@pritidesai
Copy link
Member

/lgtm

thanks @yaoxiaoqi for your patience and all the work 🙏 The changes look much cleaner now 🎉

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2021
@pritidesai
Copy link
Member

/test pull-tekton-pipeline-alpha-integration-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants