-
Notifications
You must be signed in to change notification settings - Fork 4k
xds: xdsClient caches transient error for new watchers #12262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -676,6 +676,8 @@ private final class ResourceSubscriber<T extends ResourceUpdate> { | |||||||||||||||||||||||||||||||||
| private ResourceMetadata metadata; | ||||||||||||||||||||||||||||||||||
| @Nullable | ||||||||||||||||||||||||||||||||||
| private String errorDescription; | ||||||||||||||||||||||||||||||||||
| @Nullable | ||||||||||||||||||||||||||||||||||
| private Status lastError; | ||||||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unit tests?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unit tests are needed for this? I don't know how I replicate this behaviour. The logic sounds well, we should probably let it merge and see?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unit tests prevent breakages in the future from new code changes.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A unit test shouldn't be too hard. We find a test that is triggering So as "the first one I noticed" there's ldsResourceUpdated_withXdstpResourceName_withUnknownAuthority(). grpc-java/xds/src/test/java/io/grpc/xds/GrpcXdsClientImplTestBase.java Lines 680 to 695 in 6462ef9
You could create a
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not confident with this unit test since we are not checking the instance to be same. It seems to be getting caught in
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR introduces
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your PR introduces
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like none of the tests are stopping at that break point from GrpcXdsClientImplTestBase class.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh I was keeping the break point inside onError where savedError was getting defined. My bad. I made a new commit. PTAL. |
||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| ResourceSubscriber(XdsResourceType<T> type, String resource) { | ||||||||||||||||||||||||||||||||||
| syncContext.throwIfNotInThisSynchronizationContext(); | ||||||||||||||||||||||||||||||||||
|
|
@@ -712,11 +714,16 @@ void addWatcher(ResourceWatcher<T> watcher, Executor watcherExecutor) { | |||||||||||||||||||||||||||||||||
| watchers.put(watcher, watcherExecutor); | ||||||||||||||||||||||||||||||||||
| T savedData = data; | ||||||||||||||||||||||||||||||||||
| boolean savedAbsent = absent; | ||||||||||||||||||||||||||||||||||
| Status savedError = lastError; | ||||||||||||||||||||||||||||||||||
| watcherExecutor.execute(() -> { | ||||||||||||||||||||||||||||||||||
| if (errorDescription != null) { | ||||||||||||||||||||||||||||||||||
| watcher.onError(Status.INVALID_ARGUMENT.withDescription(errorDescription)); | ||||||||||||||||||||||||||||||||||
| return; | ||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||
| if (savedError != null) { | ||||||||||||||||||||||||||||||||||
| watcher.onError(savedError); | ||||||||||||||||||||||||||||||||||
| return; | ||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||
| if (savedData != null) { | ||||||||||||||||||||||||||||||||||
| notifyWatcher(watcher, savedData); | ||||||||||||||||||||||||||||||||||
| } else if (savedAbsent) { | ||||||||||||||||||||||||||||||||||
|
|
@@ -808,6 +815,7 @@ void onData(ParsedResource<T> parsedResource, String version, long updateTime, | |||||||||||||||||||||||||||||||||
| this.metadata = ResourceMetadata | ||||||||||||||||||||||||||||||||||
| .newResourceMetadataAcked(parsedResource.getRawResource(), version, updateTime); | ||||||||||||||||||||||||||||||||||
| absent = false; | ||||||||||||||||||||||||||||||||||
| lastError = null; | ||||||||||||||||||||||||||||||||||
| if (resourceDeletionIgnored) { | ||||||||||||||||||||||||||||||||||
| logger.log(XdsLogLevel.FORCE_INFO, "xds server {0}: server returned new version " | ||||||||||||||||||||||||||||||||||
| + "of resource for which we previously ignored a deletion: type {1} name {2}", | ||||||||||||||||||||||||||||||||||
|
|
@@ -857,6 +865,7 @@ void onAbsent(@Nullable ProcessingTracker processingTracker, ServerInfo serverIn | |||||||||||||||||||||||||||||||||
| if (!absent) { | ||||||||||||||||||||||||||||||||||
| data = null; | ||||||||||||||||||||||||||||||||||
| absent = true; | ||||||||||||||||||||||||||||||||||
| lastError = null; | ||||||||||||||||||||||||||||||||||
| metadata = serverInfo.resourceTimerIsTransientError() | ||||||||||||||||||||||||||||||||||
| ? ResourceMetadata.newResourceMetadataTimeout() | ||||||||||||||||||||||||||||||||||
| : ResourceMetadata.newResourceMetadataDoesNotExist(); | ||||||||||||||||||||||||||||||||||
|
|
@@ -894,6 +903,7 @@ void onError(Status error, @Nullable ProcessingTracker tracker) { | |||||||||||||||||||||||||||||||||
| Status errorAugmented = Status.fromCode(error.getCode()) | ||||||||||||||||||||||||||||||||||
| .withDescription(description + "nodeID: " + bootstrapInfo.node().getId()) | ||||||||||||||||||||||||||||||||||
| .withCause(error.getCause()); | ||||||||||||||||||||||||||||||||||
| this.lastError = errorAugmented; | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| for (ResourceWatcher<T> watcher : watchers.keySet()) { | ||||||||||||||||||||||||||||||||||
| if (tracker != null) { | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: Change data to
StatusOr<T> datawould have kept this a bit more clear. You still have data vs absent, but at least you don't have yet another thing to deal with.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that it's a more invasive refactoring change since it would require touching every method that uses the data field. I can try to do it in a follow up PR and keep this PR's scope limited to bug fix?
I'll agree that we are moving to more usage of
StatusOr<T>in our codebase everywhere. We should consider here as well for consistency.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only a few other places that access
data. It does look like it'd make all of them more complicated. Most are of the formdata != null, and would likely need to becomedata != null && data.hasValue().