You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The createTaskRun and createCustomRun now uses wait.ExponentialBackoff to retry
the creation of a taskRun or customRun when certain errors occur, specifically
webhook timeouts.
The function isWebhookTimeout checks if an error is a mutating adminssion
webhook timeout, by looking for HTTP 500 and the phrase "timeout" in the error
message.
If a webhook timeout is detected, the backoff loop will retry the creation
up to a configured number of steps, with increasing delay between attempts.
if the error is not a webhook timeout, the function will not retry and will
return the error immediately.
Errors that not webhook timeouts, e.g. HTTP 400 bad request, validation errors,
etc. are not retried and will cause the taskRun creation to fail as expected.
By default, the exponential backoff strategy is disabled. To enable this
feature, set the `enable-wait-exponential-backoff` to `true` in
feature-flags config map.
When enabled, the controller will use an exponential backoff strategy to retry
taskRun and customRun creation if it encounters transient errors such as
admission webhook timeouts.
This improves robustness against temporary webhook issues. If the feature flag
is set to false, the controller will not retry and will fail immediately on
such errors.
Configuration for the backoff parameters (duration, factor, steps, etc) can be
set in the wait-exponential-backoff config map.
Signed-off-by: Priti Desai <[email protected]>
Copy file name to clipboardExpand all lines: docs/additional-configs.md
+54Lines changed: 54 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,7 @@ installation.
33
33
-[Pipelinerun with Affinity Assistant](#pipelineruns-with-affinity-assistant)
34
34
-[TaskRuns with `imagePullBackOff` Timeout](#taskruns-with-imagepullbackoff-timeout)
35
35
-[Disabling Inline Spec in TaskRun and PipelineRun](#disabling-inline-spec-in-taskrun-and-pipelinerun)
36
+
-[Exponential Backoff for TaskRun and CustomRun Creation](#exponential-backoff-for-taskrun-and-customrun-creation)
36
37
-[Next steps](#next-steps)
37
38
38
39
@@ -711,6 +712,59 @@ Inline specifications can be disabled for specific resources only. To achieve th
711
712
712
713
The default value of disable-inline-spec is "", which means inline specification is enabled in all cases.
713
714
715
+
## Exponential Backoff for TaskRun and CustomRun Creation
716
+
717
+
By default, when Tekton Pipelines attempts to create a TaskRun or CustomRun resource and encounters an error, the controller will requeue the PipelineRun foranother reconciliation attempt after a short delay. However, this may not be sufficientin a heavily loaded or busy cluster, where transient errors such as webhook timeouts or network issues are more likely to occur.
718
+
719
+
To improve robustness in environments where webhook timeouts or network errors are possible, you can enable an **exponential backoff retry strategy**forTaskRun and CustomRun creation by setting the `enable-wait-exponential-backoff` feature flag to `"true"`in the `feature-flags` ConfigMap:
720
+
721
+
```yaml
722
+
apiVersion: v1
723
+
kind: ConfigMap
724
+
metadata:
725
+
name: feature-flags
726
+
namespace: tekton-pipelines
727
+
data:
728
+
enable-wait-exponential-backoff: "true"
729
+
```
730
+
731
+
When this flag is enabled, the controller will retry TaskRun and CustomRun creation using an exponential backoff strategy if it encounters admission webhook timeouts (e.g., `"timeout"`).
732
+
733
+
This helps mitigate failures due to temporary unavailability of webhooks or network disruptions.
734
+
735
+
### Backoff Configuration
736
+
737
+
You can further customize the backoff parameters (such as initial duration, factor, steps, and cap) using the `wait-exponential-backoff` ConfigMap:
738
+
739
+
```yaml
740
+
apiVersion: v1
741
+
kind: ConfigMap
742
+
metadata:
743
+
name: wait-exponential-backoff
744
+
namespace: tekton-pipelines
745
+
data:
746
+
duration: "1s"
747
+
factor: "2.0"
748
+
jitter: "0.0"
749
+
steps: "10"
750
+
cap: "30s"
751
+
```
752
+
753
+
- **duration**: Initial waittime before the first retry.
754
+
- **factor**: Multiplier for each subsequent retry interval.
755
+
- **steps**: Maximum number of retry attempts.
756
+
- **cap**: Maximum waittime between retries.
757
+
758
+
### Default Behavior
759
+
760
+
If `enable-wait-exponential-backoff` is not set or is set to `"false"`, the controller will rely on its standard reconcilation loop to retry TaskRun or CustomRun creation after a short delay when failures occur due to webhook or network errors.
761
+
762
+
---
763
+
764
+
**Note:** This feature is especially useful in clusters where webhook services (such as Kyverno, OPA, or custom admission controllers) may be temporarily unavailable or slow to respond.
765
+
766
+
---
767
+
714
768
## Next steps
715
769
716
770
To get started with Tekton check the [Introductory tutorials][quickstarts],
0 commit comments