-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: Ensure retryable errors during validation do not fail Runs #8746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Ensure retryable errors during validation do not fail Runs #8746
Conversation
e4afe99
to
2e6116e
Compare
The following is the coverage report on the affected files.
|
/cc @vdemeester |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this, can we also ensure an e2e test for this where a Run might fail deliberately on the first 2 tries but passes on the third try by design? Also it might be good to document this new transient error.
@waveywaves this is difficult but if there is prior art on it I can take a look. Would that be blocking?
Yes, that's a good idea. Do you know where would be appropriate to document this, outside of release notes? |
A test for this would be hard, and the change IS a small one. The test is not blocking I believe. I couldn't find any docs related to transient errors during pipeline resolution. There are multiple resolvers documented under docs/ but this behavior is not documented anywhere. There isn't any documentation related to remote resolution which can help the user understand which errors are considered transient in case their pipeline might be facing these which also communicate the robustness of remote resolution. We should add this documentation under each resolver or at the end of the resolved reference doc. Something like
I would prefer putting it under the docs/resolver-reference.md towards the end of the docover writing it in every resolver doc. |
2e6116e
to
7df2978
Compare
The following is the coverage report on the affected files.
|
/kind bug |
During resource creation/validation, retryale errors such as a k8s timeout should not result in the resource being marked as failed. To ensure this is the case, these errors must either be bubbled up or wrapped and cannot be un-wrapped into a new error using `retryableErr.Error()`. Additionally, IsTooManyRequests is now considered Transient (retryable).
7df2978
to
db12db2
Compare
The following is the coverage report on the affected files.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 😬
errType = ErrCouldntValidateObjectRetryable | ||
} | ||
return fmt.Errorf("%w %s: %s", errType, objectName, err.Error()) | ||
return fmt.Errorf("%w %s: %w", errType, objectName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
thank you for the changes @aThorp96 /lgtm |
@vdemeester @aThorp96 Can we cherry-pick this bug to the |
/cherry-pick release-v1.0.x |
Changes
During resource creation/validation, retryale errors such as a k8s timeout should not result in the resource being marked as failed. To ensure this is the case, these errors must either be bubbled up or wrapped and cannot be un-wrapped into a new error using
retryableErr.Error()
.Additionally, IsTooManyRequests is now considered Transient (retryable).
/kind bug
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
/kind <type>
. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes