You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Metadata requests: Fixes bug where certain metadata requests are not retried with a client cold start with only query requests (#5108)
# Pull Request Template
## Description
### Context
Currently, there is a bug in the SDK where upon a cold start of the SDK
and rare edge cases involving online/offline-ing regions, where only
query requests are made, the SDK will not retry certain status code
responses from metadata requests causing the entire request to fail. The
correct behavior would be for the SDK to do cross region retries on
these metadata requests.
This pulls request includes several updates to enhance error handling
and retry logic in the Cosmos DB SDK. The changes mainly focus on
extending support for additional server error types and improving retry
policies for various scenarios.
### Improvements to retry logic:
*
[`ClientRetryPolicy.cs`](diffhunk://#diff-2b056512ca285b1d95e025e31f60345059fa92d958becc38f90a6fb54ce1bbb4R331-R341):
Enhanced retry logic to handle `InternalServerError` ,
`DatabaseAccountNotFound`, and `LeaseNotFound` status codes.
*
[`MetadataRequestThrottleRetryPolicy.cs`](diffhunk://#diff-a5ed5985909c3dcb6e4ce186cdd662d590dac5297ea14e68560c7d1eca307be4L26-R28):
Refactored retry policy to handle additional status codes and renamed
methods and constants to reflect the broader scope of endpoint
unavailability.
### FaultInjection enhancements to error handling testing:
*
[`FaultInjectionRuleBuilder.cs`](diffhunk://#diff-d827164a4a6a0d8737e6598f8132c915ef48a1fc01daaa6422706f770dada5d5L152-R156):
Added support for additional server error types such as
`DatabaseAccountNotFound`, `ServiceUnavailable`, `InternalServerError`,
and `LeaseNotFound` in the for metadata requests.
*
[`FaultInjectionServerErrorType.cs`](diffhunk://#diff-0c89faa9a48c428a7a98662d995474e34295618ac60e677ad9762fd048f33601L75-R82):
Updated the `FaultInjectionServerErrorType` enum to include
`LeaseNotFound` and corrected the status code for
`DatabaseAccountNotFound`.
*
[`FaultInjectionServerErrorResultInternal.cs`](diffhunk://#diff-1ae8256c6d505a8f3b0a350978a0cc9a08f6234f7328f28f1db14302ee691d72L473-R473):
Added handling for `LeaseNotFound` and updated the status code for
`DatabaseAccountNotFound` in the `GetInjectedServerError` method.
*
### Testing updates:
*
[`CosmosItemIntegrationTests.cs`](diffhunk://#diff-16d429adf686a32936696d2014afab3fc8faf91f10c880850fb8b30f8b96bb33R153-R214):
Added a new test method
`MetadataEndpointUnavailableCrossRegionalRetryTest` to validate the
retry logic for various server error types.
*
[`ClientRetryPolicyTests.cs`](diffhunk://#diff-d3fdfdc5d4f4d8af2c2cc463d928285680e7695861422ef2e3330d1a956807e1L167-R176):
Extended existing tests to cover additional status codes and substatus
codes.
## Type of change
Please delete options that are not relevant.
- [] Bug fix (non-breaking change which fixes an issue)
## Closing issues
To automatically close an issue: closes#4710
---------
Co-authored-by: Kiran Kumar Kolli <[email protected]>
/// For a ServiceUnavailable (503.0) we could be having a timeout from Direct/TCP locally or a request to Gateway request with a similar response due to an endpoint not yet available.
475
483
/// We try and retry the request only if there are other regions available. The retry logic is applicable for single master write accounts as well.
484
+
/// Other status codes include InternalServerError (500.0) and LeaseNotFound (410.1022).
DefaultTrace.TraceWarning("MetadataRequestThrottleRetryPolicy: Retry count: {0} has exceeded the maximum permitted retry count on service unavailable: {1}.",this.serviceUnavailableRetryCount,this.maxServiceUnavailableRetryCount);
208
+
DefaultTrace.TraceWarning("MetadataRequestThrottleRetryPolicy: Retry count: {0} has exceeded the maximum permitted retry count on unavailable endpoint: {1}.",this.unavailableEndpointRetryCount,this.maxUnavailableEndpointRetryCount);
157
209
returnfalse;
158
210
}
159
211
160
212
// Retrying on second PreferredLocations.
161
213
// RetryCount is used as zero-based index.
162
-
DefaultTrace.TraceWarning("MetadataRequestThrottleRetryPolicy: Incrementing the metadata retry location index to: {0}.",this.serviceUnavailableRetryCount);
214
+
DefaultTrace.TraceWarning("MetadataRequestThrottleRetryPolicy: Incrementing the metadata retry location index to: {0}.",this.unavailableEndpointRetryCount);
0 commit comments