Skip to content

Commit 149f686

Browse files
[Internal] Circuit Breaker: Adds Code to Implement Per Partition Circuit Breaker (#5023)
# Pull Request Template ## Description The idea of having a per partition circuit breaker (aka PPCB) is to optimize **a)** read availability , in a single master account and **b)** read + write availability in a multi master account, during a time when a specific partition in one of the regions is experiencing an outage/ quorum loss. This feature is independent of the partition level failover triggered by the backend. The per partition circuit breaker is developed behind a feature flag `AZURE_COSMOS_CIRCUIT_BREAKER_ENABLED`. However, when the partition level failover is enabled, we will enable the PPCB by default so that the reads can benefits from it. ## Scope - For single master, only the read requests will use the circuit breaker to add the pk-range to region override mapping, and use this mapping as a source of truth to route the read requests. - For multi master, both the read and write requests will use the circuit breaker to add the pk-range to region override mapping and use this mapping as a source of truth to route both read and write requests. ## Understanding the Configurations exposed by the environment variables: - `AZURE_COSMOS_CIRCUIT_BREAKER_ENABLED`: This environment variable is used to enable/ disable the partition level circuit breaker feature. The default value is `false`. - `AZURE_COSMOS_PPCB_STALE_PARTITION_UNAVAILABILITY_REFRESH_INTERVAL_IN_SECONDS`: This environment variable is used to set the background periodic address refresh task interval. The default value for this interval is `60 seconds`. - `AZURE_COSMOS_PPCB_ALLOWED_PARTITION_UNAVAILABILITY_DURATION_IN_SECONDS`: This environment variable is used to set the partition unavailability time duration in seconds. The unavailability time indicates how long a partition can remain unhealthy, before it can re-validate it's connection status. The default value for this property is `5 seconds`. - `AZURE_COSMOS_PPCB_CONSECUTIVE_FAILURE_COUNT_FOR_READS`: This environment variable is used to set the consecutive failure count for reads, before triggering per partition level circuit breaker flow. The default value for this flag is `10` consecutive failures within 1 min window. - `AZURE_COSMOS_PPCB_CONSECUTIVE_FAILURE_COUNT_FOR_WRITES`: This environment variable is used to set the consecutive failure count for writes, before triggering per partition level circuit breaker flow. The default value for this flag is `5` consecutive failures within 1 min window. ## Understanding the Working Principle: On a high level, there are three parts of the circuit breaker implementation: - **Short Circuit and Failover detection:** The failover detection logic will reside in the SDK ClientRetryPolicy, just like we have for PPAF. Ideally the detection logic is based on the below two principles: - **Status Codes:** The status codes that are indicative of partition level circuit breaker would be the following: a) `503` Service Unavailable, b) `408` Request Timeout, c) cancellation token expired. - **Threshold:** Once the failover condition is met, the SDK will look for some consecutive failures, until it hits a particular threshold. Once this threshold is met, the SDK will fail over the read requests to the next preferred region for that offending partition. For example, if the threshold value for read requests is `10`, then the SDK will look for `10` consecutive failures. If the threshold is met/ exceeded, the SDK will add the region failover information for that partition. - **Failover a faulty partition to the next preferred region:** Once the failover conditions are met, the `ClientRetryPolicy` will trigger a partition level override using `GlobalPartitionEndpointManagerCore.TryMarkEndpointUnavailableForPartitionKeyRange` to the next region in the preferred region list. This failover information will help the current, as well as the subsequent requests (reads in single master and both reads and writes in multi master) to route the request to the next region. - **Failback the faulty partition to it's original first preferred region:** With PPAF enabled, ideally the write requests will rely on 403.3 (Write Forbidden) signal to fail the partition back to the primary write region. However, this is not true for reads. That means SDK doesn’t have a definitive signal to identify when to initiate a failback for read requests. Hence, the idea is to create a background task during the time of read failover, which will keep track of the pk-range and region mapping. The task will periodically fetch the address from the gateway address cache for those pk ranges in the faulty region, and it will try to initiate Rntbd connection to all 4 replicas of that partition. The RNTBD open connection attempt will be made similar to that of the replica validation flow. The life cycle of the background task will get initiated during a failover and will remain until the SDK is disposed. If the attempt to make the connection to all 4 replicas is successful, then the task will remove/ override the entry with the primary region, resulting the SDK to failback the read requests. ## Type of change - [x] New feature (non-breaking change which adds functionality) ## Closing issues To automatically close an issue: closes #4981
1 parent fed8be3 commit 149f686

17 files changed

+2122
-182
lines changed

Microsoft.Azure.Cosmos/src/ClientRetryPolicy.cs

Lines changed: 66 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ internal sealed class ClientRetryPolicy : IDocumentClientRetryPolicy
2828
private readonly GlobalEndpointManager globalEndpointManager;
2929
private readonly GlobalPartitionEndpointManager partitionKeyRangeLocationCache;
3030
private readonly bool enableEndpointDiscovery;
31-
private readonly bool isPertitionLevelFailoverEnabled;
31+
private readonly bool isPartitionLevelFailoverEnabled;
3232
private int failoverRetryCount;
3333

3434
private int sessionTokenRetryCount;
@@ -45,7 +45,7 @@ public ClientRetryPolicy(
4545
GlobalPartitionEndpointManager partitionKeyRangeLocationCache,
4646
RetryOptions retryOptions,
4747
bool enableEndpointDiscovery,
48-
bool isPertitionLevelFailoverEnabled)
48+
bool isPartitionLevelFailoverEnabled)
4949
{
5050
this.throttlingRetry = new ResourceThrottleRetryPolicy(
5151
retryOptions.MaxRetryAttemptsOnThrottledRequests,
@@ -59,7 +59,7 @@ public ClientRetryPolicy(
5959
this.serviceUnavailableRetryCount = 0;
6060
this.canUseMultipleWriteLocations = false;
6161
this.isMultiMasterWriteRequest = false;
62-
this.isPertitionLevelFailoverEnabled = isPertitionLevelFailoverEnabled;
62+
this.isPartitionLevelFailoverEnabled = isPartitionLevelFailoverEnabled;
6363
}
6464

6565
/// <summary>
@@ -80,13 +80,9 @@ public async Task<ShouldRetryResult> ShouldRetryAsync(
8080
this.documentServiceRequest?.RequestContext?.LocationEndpointToRoute?.ToString() ?? string.Empty,
8181
this.documentServiceRequest?.ResourceAddress ?? string.Empty);
8282

83-
if (this.isPertitionLevelFailoverEnabled)
84-
{
85-
// In the event of the routing gateway having outage on region A, mark the partition as unavailable assuming that the
86-
// partition has been failed over to region B, when per partition automatic failover is enabled.
87-
this.partitionKeyRangeLocationCache.TryMarkEndpointUnavailableForPartitionKeyRange(
88-
this.documentServiceRequest);
89-
}
83+
// In the event of the routing gateway having outage on region A, mark the partition as unavailable assuming that the
84+
// partition has been failed over to region B, when per partition automatic failover is enabled.
85+
this.TryMarkEndpointUnavailableForPkRange(isSystemResourceUnavailableForWrite: false);
9086

9187
// Mark both read and write requests because it gateway exception.
9288
// This means all requests going to the region will fail.
@@ -113,7 +109,7 @@ public async Task<ShouldRetryResult> ShouldRetryAsync(
113109
StatusCodes.TooManyRequests, SubStatusCodes.SystemResourceUnavailable);
114110

115111
return this.TryMarkEndpointUnavailableForPkRangeAndRetryOnServiceUnavailable(
116-
shouldMarkEndpointUnavailableForPkRange: true);
112+
isSystemResourceUnavailableForWrite: true);
117113
}
118114

119115
ShouldRetryResult shouldRetryResult = await this.ShouldRetryInternalAsync(
@@ -176,7 +172,7 @@ public async Task<ShouldRetryResult> ShouldRetryAsync(
176172
StatusCodes.TooManyRequests, SubStatusCodes.SystemResourceUnavailable);
177173

178174
return this.TryMarkEndpointUnavailableForPkRangeAndRetryOnServiceUnavailable(
179-
shouldMarkEndpointUnavailableForPkRange: true);
175+
isSystemResourceUnavailableForWrite: true);
180176
}
181177

182178
return await this.throttlingRetry.ShouldRetryAsync(cosmosResponseMessage, cancellationToken);
@@ -236,8 +232,7 @@ private async Task<ShouldRetryResult> ShouldRetryInternalAsync(
236232
this.documentServiceRequest?.ResourceAddress ?? string.Empty);
237233

238234
// Mark the partition key range as unavailable to retry future request on a new region.
239-
this.partitionKeyRangeLocationCache.TryMarkEndpointUnavailableForPartitionKeyRange(
240-
this.documentServiceRequest);
235+
this.TryMarkEndpointUnavailableForPkRange(isSystemResourceUnavailableForWrite: false);
241236
}
242237

243238
// Received 403.3 on write region, initiate the endpoint rediscovery
@@ -313,7 +308,7 @@ private async Task<ShouldRetryResult> ShouldRetryInternalAsync(
313308
if (statusCode == HttpStatusCode.ServiceUnavailable)
314309
{
315310
return this.TryMarkEndpointUnavailableForPkRangeAndRetryOnServiceUnavailable(
316-
shouldMarkEndpointUnavailableForPkRange: true);
311+
isSystemResourceUnavailableForWrite: false);
317312
}
318313

319314
return null;
@@ -442,23 +437,18 @@ private ShouldRetryResult ShouldRetryOnSessionNotAvailable(DocumentServiceReques
442437
/// Service Unavailable response is received, indicating that the service might be temporarily unavailable.
443438
/// It optionally marks the partition key range as unavailable, which will influence future routing decisions.
444439
/// </summary>
445-
/// <param name="shouldMarkEndpointUnavailableForPkRange">A boolean flag indicating whether the endpoint for the
446-
/// current partition key range should be marked as unavailable.</param>
440+
/// <param name="isSystemResourceUnavailableForWrite">A boolean flag indicating whether the endpoint for the
441+
/// current partition key range should be marked as unavailable, if the failure happened due to system
442+
/// resource unavailability.</param>
447443
/// <returns>An instance of <see cref="ShouldRetryResult"/> indicating whether the operation should be retried.</returns>
448444
private ShouldRetryResult TryMarkEndpointUnavailableForPkRangeAndRetryOnServiceUnavailable(
449-
bool shouldMarkEndpointUnavailableForPkRange)
445+
bool isSystemResourceUnavailableForWrite)
450446
{
451447
DefaultTrace.TraceWarning("ClientRetryPolicy: ServiceUnavailable. Refresh cache and retry. Failed Location: {0}; ResourceAddress: {1}",
452448
this.documentServiceRequest?.RequestContext?.LocationEndpointToRoute?.ToString() ?? string.Empty,
453449
this.documentServiceRequest?.ResourceAddress ?? string.Empty);
454450

455-
if (shouldMarkEndpointUnavailableForPkRange)
456-
{
457-
// Mark the partition as unavailable.
458-
// Let the ClientRetry logic decide if the request should be retried
459-
this.partitionKeyRangeLocationCache.TryMarkEndpointUnavailableForPartitionKeyRange(
460-
this.documentServiceRequest);
461-
}
451+
this.TryMarkEndpointUnavailableForPkRange(isSystemResourceUnavailableForWrite);
462452

463453
return this.ShouldRetryOnServiceUnavailable();
464454
}
@@ -477,7 +467,7 @@ private ShouldRetryResult ShouldRetryOnServiceUnavailable()
477467

478468
if (!this.canUseMultipleWriteLocations
479469
&& !this.isReadRequest
480-
&& !this.isPertitionLevelFailoverEnabled)
470+
&& !this.isPartitionLevelFailoverEnabled)
481471
{
482472
// Write requests on single master cannot be retried if partition level failover is disabled.
483473
// This means there are no other regions available to serve the writes.
@@ -506,6 +496,30 @@ private ShouldRetryResult ShouldRetryOnServiceUnavailable()
506496
return ShouldRetryResult.RetryAfter(TimeSpan.Zero);
507497
}
508498

499+
/// <summary>
500+
/// Attempts to mark the endpoint associated with the current partition key range as unavailable
501+
/// which will influence future routing decisions.
502+
/// </summary>
503+
/// <param name="isSystemResourceUnavailableForWrite">A boolean flag indicating if the system resource was unavailable. If true,
504+
/// the endpoint will be marked unavailable for the pk-range of a multi master write request, bypassing the circuit breaker check.</param>
505+
/// <returns>A boolean flag indicating whether the endpoint was marked as unavailable.</returns>
506+
private bool TryMarkEndpointUnavailableForPkRange(
507+
bool isSystemResourceUnavailableForWrite)
508+
{
509+
if (this.documentServiceRequest != null
510+
&& (isSystemResourceUnavailableForWrite
511+
|| this.IsRequestEligibleForPerPartitionAutomaticFailover()
512+
|| this.IsRequestEligibleForPartitionLevelCircuitBreaker()))
513+
{
514+
// Mark the partition as unavailable.
515+
// Let the ClientRetry logic decide if the request should be retried
516+
return this.partitionKeyRangeLocationCache.TryMarkEndpointUnavailableForPartitionKeyRange(
517+
request: this.documentServiceRequest);
518+
}
519+
520+
return false;
521+
}
522+
509523
/// <summary>
510524
/// Returns a boolean flag indicating if the endpoint should be marked as unavailable
511525
/// due to a 429 response with a sub status code of 3092 (system resource unavailable).
@@ -524,6 +538,32 @@ private bool ShouldMarkEndpointUnavailableOnSystemResourceUnavailableForWrite(
524538
&& subStatusCode == SubStatusCodes.SystemResourceUnavailable;
525539
}
526540

541+
/// <summary>
542+
/// Determines if a request is eligible for per-partition automatic failover.
543+
/// A request is eligible if it is a write request, partition level failover is enabled,
544+
/// and the global endpoint manager cannot use multiple write locations for the request.
545+
/// </summary>
546+
/// <returns>True if the request is eligible for per-partition automatic failover, otherwise false.</returns>
547+
private bool IsRequestEligibleForPerPartitionAutomaticFailover()
548+
{
549+
return this.partitionKeyRangeLocationCache.IsRequestEligibleForPerPartitionAutomaticFailover(
550+
this.documentServiceRequest);
551+
}
552+
553+
/// <summary>
554+
/// Determines if a request is eligible for partition-level circuit breaker.
555+
/// This method checks if the request is a read-only request or a multi master write request, if partition-level circuit breaker is enabled,
556+
/// and if the partition key range location cache indicates that the partition can fail over based on the number of request failures.
557+
/// </summary>
558+
/// <returns>
559+
/// True if the read request is eligible for partition-level circuit breaker, otherwise false.
560+
/// </returns>
561+
private bool IsRequestEligibleForPartitionLevelCircuitBreaker()
562+
{
563+
return this.partitionKeyRangeLocationCache.IsRequestEligibleForPartitionLevelCircuitBreaker(this.documentServiceRequest)
564+
&& this.partitionKeyRangeLocationCache.IncrementRequestFailureCounterAndCheckIfPartitionCanFailover(this.documentServiceRequest);
565+
}
566+
527567
private sealed class RetryContext
528568
{
529569
public int RetryLocationIndex { get; set; }

Microsoft.Azure.Cosmos/src/ConnectionPolicy.cs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,12 @@ public bool EnablePartitionLevelFailover
332332
set;
333333
}
334334

335+
public bool EnablePartitionLevelCircuitBreaker
336+
{
337+
get;
338+
set;
339+
}
340+
335341
/// <summary>
336342
/// Gets or sets the certificate validation callback.
337343
/// </summary>

Microsoft.Azure.Cosmos/src/CosmosClientOptions.cs

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -729,7 +729,14 @@ public Func<HttpClient> HttpClientFactory
729729
/// <summary>
730730
/// Enable partition key level failover
731731
/// </summary>
732-
internal bool EnablePartitionLevelFailover { get; set; } = ConfigurationManager.IsPartitionLevelFailoverEnabled(defaultValue: false);
732+
internal bool EnablePartitionLevelFailover { get; set; } = ConfigurationManager.IsPartitionLevelFailoverEnabled(defaultValue: false);
733+
734+
/// <summary>
735+
/// Enable partition level circuit breaker (aka PPCB). For compute gateway use case, by default per partition automatic failover will be disabled, so does the PPCB.
736+
/// If compute gateway chooses to enable PPAF, then the .NET SDK will enable PPCB by default, which will improve the read availability and latency. This would mean
737+
/// when PPAF is enabled, the SDK will automatically enable PPCB as well.
738+
/// </summary>
739+
internal bool EnablePartitionLevelCircuitBreaker { get; set; } = ConfigurationManager.IsPartitionLevelCircuitBreakerEnabled(defaultValue: false);
733740

734741
/// <summary>
735742
/// Quorum Read allowed with eventual consistency account or consistent prefix account.
@@ -983,6 +990,7 @@ internal virtual ConnectionPolicy GetConnectionPolicy(int clientId)
983990
MaxTcpConnectionsPerEndpoint = this.MaxTcpConnectionsPerEndpoint,
984991
EnableEndpointDiscovery = !this.LimitToEndpoint,
985992
EnablePartitionLevelFailover = this.EnablePartitionLevelFailover,
993+
EnablePartitionLevelCircuitBreaker = this.EnablePartitionLevelFailover || this.EnablePartitionLevelCircuitBreaker,
986994
PortReuseMode = this.portReuseMode,
987995
EnableTcpConnectionEndpointRediscovery = this.EnableTcpConnectionEndpointRediscovery,
988996
EnableAdvancedReplicaSelectionForTcp = this.EnableAdvancedReplicaSelectionForTcp,
@@ -1221,6 +1229,11 @@ internal string GetUserAgentSuffix()
12211229
featureFlag += (int)UserAgentFeatureFlags.PerPartitionAutomaticFailover;
12221230
}
12231231

1232+
if (this.EnablePartitionLevelFailover || this.EnablePartitionLevelCircuitBreaker)
1233+
{
1234+
featureFlag += (int)UserAgentFeatureFlags.PerPartitionCircuitBreaker;
1235+
}
1236+
12241237
if (featureFlag == 0)
12251238
{
12261239
return this.ApplicationName;

Microsoft.Azure.Cosmos/src/Diagnostics/UserAgentFeatureFlags.cs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,7 @@ namespace Microsoft.Azure.Cosmos
1616
internal enum UserAgentFeatureFlags
1717
{
1818
PerPartitionAutomaticFailover = 1,
19+
20+
PerPartitionCircuitBreaker = 2,
1921
}
2022
}

Microsoft.Azure.Cosmos/src/DocumentClient.cs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -939,8 +939,11 @@ internal virtual void Initialize(Uri serviceEndpoint,
939939
#endif
940940

941941
this.GlobalEndpointManager = new GlobalEndpointManager(this, this.ConnectionPolicy);
942-
this.PartitionKeyRangeLocation = this.ConnectionPolicy.EnablePartitionLevelFailover
943-
? new GlobalPartitionEndpointManagerCore(this.GlobalEndpointManager)
942+
this.PartitionKeyRangeLocation = this.ConnectionPolicy.EnablePartitionLevelFailover || this.ConnectionPolicy.EnablePartitionLevelCircuitBreaker
943+
? new GlobalPartitionEndpointManagerCore(
944+
this.GlobalEndpointManager,
945+
this.ConnectionPolicy.EnablePartitionLevelFailover,
946+
this.ConnectionPolicy.EnablePartitionLevelCircuitBreaker)
944947
: GlobalPartitionEndpointManagerNoOp.Instance;
945948

946949
this.httpClient = CosmosHttpClientCore.CreateWithConnectionPolicy(

Microsoft.Azure.Cosmos/src/RetryPolicy.cs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ internal sealed class RetryPolicy : IRetryPolicyFactory
1313
private readonly GlobalPartitionEndpointManager partitionKeyRangeLocationCache;
1414
private readonly GlobalEndpointManager globalEndpointManager;
1515
private readonly bool enableEndpointDiscovery;
16-
private readonly bool isPertitionLevelFailoverEnabled;
16+
private readonly bool isPartitionLevelFailoverEnabled;
1717
private readonly RetryOptions retryOptions;
1818

1919
/// <summary>
@@ -25,7 +25,7 @@ public RetryPolicy(
2525
GlobalPartitionEndpointManager partitionKeyRangeLocationCache)
2626
{
2727
this.enableEndpointDiscovery = connectionPolicy.EnableEndpointDiscovery;
28-
this.isPertitionLevelFailoverEnabled = connectionPolicy.EnablePartitionLevelFailover;
28+
this.isPartitionLevelFailoverEnabled = connectionPolicy.EnablePartitionLevelFailover;
2929
this.globalEndpointManager = globalEndpointManager;
3030
this.retryOptions = connectionPolicy.RetryOptions;
3131
this.partitionKeyRangeLocationCache = partitionKeyRangeLocationCache;
@@ -41,7 +41,7 @@ public IDocumentClientRetryPolicy GetRequestPolicy()
4141
this.partitionKeyRangeLocationCache,
4242
this.retryOptions,
4343
this.enableEndpointDiscovery,
44-
this.isPertitionLevelFailoverEnabled);
44+
this.isPartitionLevelFailoverEnabled);
4545

4646
return clientRetryPolicy;
4747
}

0 commit comments

Comments
 (0)