[automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself #4244

atakavci · 2025-08-15T11:46:45Z

Closes Refactor Cluster to keep a reference to HealthCheck itselft instead of tracking/updating HealthStatus with events #4240
Closes Move Endpoint into proper package #4242
Closes Set sleep duration 1 ms for handling all the waiters in TrackingConnectionPool.forceDisconnect #4243

Copilot

Pull Request Overview

This PR refactors the cluster management system to replace HealthStatus with HealthCheck objects directly, simplifying the health monitoring architecture. The change moves the Endpoint interface from the mcf package to the main jedis package and introduces a new interface-based design for health checks.

Key changes:

Replaced HealthStatus tracking with direct HealthCheck object references in cluster management
Moved Endpoint interface from redis.clients.jedis.mcf to redis.clients.jedis package
Converted HealthCheck from a concrete class to an interface with HealthCheckImpl as the implementation

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`MultiClusterPooledConnectionProvider.java`	Updated cluster construction to store `HealthCheck` objects directly instead of tracking `HealthStatus` separately
`HealthCheck.java`	Converted from concrete class to interface defining health check contract
`HealthCheckImpl.java`	New implementation class containing the original `HealthCheck` logic
`HealthStatusManager.java`	Modified to return `HealthCheck` objects and use new implementation class
`Endpoint.java`	Moved from `mcf` package to main `jedis` package
Test files	Updated imports and test logic to work with new architecture

Comments suppressed due to low confidence (1)

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/main/java/redis/clients/jedis/mcf/TrackingConnectionPool.java

src/main/java/redis/clients/jedis/mcf/HealthStatusManager.java

src/main/java/redis/clients/jedis/mcf/HealthCheckImpl.java

ggivo

Approving to unblock follow-up PRs. LGTM overall, left some comments to address later.

src/main/java/redis/clients/jedis/mcf/HealthCheckImpl.java

ggivo · 2025-08-20T09:49:25Z

src/main/java/redis/clients/jedis/mcf/HealthCheckImpl.java

+        }
+    }
+
+    public long getTimeout() {


A bit confusing — I would expect this to be the timeout for a single HealthCheck, but instead it represents the maximum time to wait for an Endpoint to reach a status during initialization.

Currently it’s exposed via HealthStatusManager.getMaxWaitFor(Endpoint).
How about calculating this directly there instead?
(We’d likely need to extend the interface to provide a reference to the strategy used for creating the HealthCheck.)

Conceptually it is suboptimal for status manager to contain that logic.
We can just rename it back to something like getMaxWaitFor or getMaxResultDuration, etc..
i agree with "confusing" part.

Still think this method should not be part of HealthCheck. It uses only info from strategy it was build from
Let's rename it for now

well,, the source of info is one thing,, while the logic/implementation that relies on that info is another.
So having all info provided from strategy is not the only argument to come to decision.
i think this is something to discuss with less priority at another time.

src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java

…refactorCluster

- rename getTimeout() -> getMaxWaitFor() - remove unnecessary noHealthChecks

…refactorCluster

@a-TODO-rov

…4228) * Setup test infra for automatic failover tests * Add toxyproxy * Add test endpoints * Add example tests * [automatic-failover] Enforce formatting for automatic failover codebase (#4176) * enforce formating for mcf * enforce formating for mcf * Trigger integration test jobs feature branches * [automatic failover] Improve failover resilience by evaluating circuit breaker before retries (#4178) * fix: Change evaluation order of circuit breaker and retry mechanisms This commit modifies the order in which circuit breaker and retry mechanisms are evaluated during failover scenarios. Previously, the circuit breaker was applied after all retries were exhausted, which could delay failover in unstable network conditions. Now, the circuit breaker counts each connection error separately, even during retry attempts. * format * format * use java-8 compatible formater version * fix AutomaticFailoverTest * formating * [automatic-failover] Implement failover retry for in-flight commands (#4175) * Implement failover retry for in-flight commands This change adds support for retrying in-flight commands when a Redis connection fails and the client automatically fails over to another cluster. The feature is configurable through the FailoverOptions builder. Key changes: - Added retryFailedInflightCommands option to FailoverOptions - Implemented retry logic in CircuitBreakerCommandExecutor - Added integration tests to verify both retry and no-retry behavior - Created utility methods for test setup and configuration This enhancement improves resilience for long-running commands like blocking operations, allowing them to transparently continue on the failover cluster without client-side errors. * Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java Co-authored-by: Copilot <[email protected]> # Conflicts: # src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java * format format & clean-up * fix FailoverIntegrationTest.testInflightCommandsAreNotRetriedAfterFailover blpop timeout should be less than async command timeout to prevent completing with java.util.concurrent.TimeoutException instead of actuall command failure * Address comments from review - rename retryFailedInflightCommands->retryOnFailover - remove check for CB in OPEN state * remove unused method --------- Co-authored-by: Copilot <[email protected]> * [automatic-failover] Fix formatting for anticipated AA Failover changes (#4183) * fix formatting for anticipated failover changes * additional formatting to cover given folders * [automatic failover] Implement HealtStatusManager + weighted endpoints (#4189) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Implement add/remove endpoints (#4200) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Implement failback with given grace period (#4204) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests * - periodic failback checks - introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed * - format * - fix timing issue --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Implement wait on healthCheck results during client init (#4207) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests * - periodic failback checks - introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed * - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider - add HealthStatus.UNKNOWN as default for Cluster - handle status changes in order of events during initialization - add tests for status tracker and orderingof events - fix impacted unit&integ tests * - introduce forceActiveCluster by duration - fix formatting * - fix failing tests by waiting on clusters to get healthy * - fix failing scenario test - downgrade logback version for slf4j compatibility - increase timeouts for faultInjector * - adressing reviews and feedback * - fix formatting * - fix formatting * - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider - add test for init and post init events - fix failing tests * - replace use of reflection with helper methods - fix failing tests due to method name change * - replace 'sleep' with 'await', feedback from @a-TODO-rov --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Introduce fast failover mode - a thread-sync-free approach with builders (#4226) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests * - periodic failback checks - introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed * - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider - add HealthStatus.UNKNOWN as default for Cluster - handle status changes in order of events during initialization - add tests for status tracker and orderingof events - fix impacted unit&integ tests * - introduce forceActiveCluster by duration - fix formatting * - fix failing tests by waiting on clusters to get healthy * - fix failing scenario test - downgrade logback version for slf4j compatibility - increase timeouts for faultInjector * - adressing reviews and feedback * - fix formatting * - fix formatting * - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider - add test for init and post init events - fix failing tests * - replace use of reflection with helper methods - fix failing tests due to method name change * - introduce clusterSwitchEvent and drop clusterFailover post processor - fix broken echostrategy due to connection issue - make healtthCheckStrategy closable and close on - adding fastfailover mode to config and provider - add local failover tests for total failover duration * - introduce fastfailover using objectMaker injection into connectionFactory * - polish * - cleanup * - improve healtcheck thread visibility * - introduce TrackingConnectionPool with FailFastConnectionFactory - added builders to connection and connectionFactory - introduce initializtionTracker to track list of connections during their construction. * - return broken source as usual - do not throw exception is failover already happening * - unblock waiting threads * - failover by closing the pool * - formatting * - check waiters and active/idle connections to force disconnect * - add builder to trackingconnectionpool * - fix failing tests due to mocked ctor for trackingConnectionPool * - replace initTracker with split initializtion of conn * - refactor on builders and ctors * - undo format * - clena up * - add exceptions to logs * - add max wait duration and minConsecutiveSuccessCount to healthCheckStrategy * - replace 'sleep' with 'await', feedback from @a-TODO-rov * - fix status tracker tests --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Add tests: EchoStrategy does recover on connection error (#4230) * Add tests: EchoStrategy does recover on connection error * log error's on exception * [automatic failover] Replace 'SimpleEntry' with 'HealthCheckResult' (#4236) * - replace SimpleEntry with HealthCheckResult * - format * [automatic failover] Add rate limiter to test MultiThreadedFakeApp (#4248) * Add option for rate limiter to MultiThreadedFakeApp * retrigger checks * [automatic failover] Implement lag-aware api (#4235) * - lag-aware api * - introduce healtCheckMetaConfig - add config objects for both EchoStrategy and LagAwareStrategy - make StrategySupplier interface generic - fix impacted tests * - fix issues with docs and log * Fix EchoStrategyIntegrationTest after merge * Implement resolve bdbId based on configured HostAndPort * format & merge RedisRestAPI unit tests * introduce builder for LagAwareStrategy * configurable extended-check - provide option to enable/disable lag check - default to lag-aware check with AVAILABILITY_LAG_TOLERANCE_DEFAULT of 100ms * - fix java doc issue * - formatting * -fix doc issue * - formatting * revert introduce healtCheckMetaConfig - introduces type unsafe warnings, revert till they are resovled * Add list constructor * Add option for rate limiter to MultiThreadedFakeApp * Use /v1/local/bdbs/<int:uid>/endpoint/availability - use /v1/local/bdbs/<int:uid>/endpoint/availability instead of availability address vs /v1/bdbs/<int:uid>/availability * Add builder factory method for base HealthCheckStrategy.Config * format * format * revert to /v1/bdbs/<int:uid>/availability reading again the spec /local is inteded to be used when configuring LoadBalancers * revert to /v1/bdbs/<int:uid>/availability reading again the spec /local is inteded to be used when configuring LoadBalancers * format * fix unit test after revert of local * format * Address review comments - remove misleading default value comment - renamed Config.endpoint to Config.restEndpoint - renamed dataPathAvailability -> databaseAvailability --------- Co-authored-by: ggivo <[email protected]> * [automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself (#4244) * - set sleep duration 1 ms for forceDisconnect * - refactor cluster to replace HealthStatus with HealthCheck itself * - move Endpoint to parent package * - format * - fix typo * - adjust tests with new provider behaviour with HealthCheck * Address review comments - rename getTimeout() -> getMaxWaitFor() - remove unnecessary noHealthChecks * resolve merge conflicts * format * format --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> * [automatic failover] fix: health checks continue to run even after provider is closed. (#4252) * fix: health checks continue to run even after provider is closed. Closes #4251 * format * format & fix tests after merge format * [automatic failover] Improve AA failover scenario test (#4249) * Improve AA failover test * Fix endpoint name * Count failedCommandsAfterFailover * do not enforce formating on ActiveActiveFailoverTest --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> * [automatic failover ]fix: only notify health status listeners on actual state changes #4255 (#4256) * fix: only notify health status listeners on actual state changes #4255 - Replace overlapping health checks with sequential execution - use wasUpdated flag to prevent false notifications from timestamp conflicts. Eliminates race conditions causing spurious failovers. Closes: #4255 * format * Update src/main/java/redis/clients/jedis/mcf/HealthCheckImpl.java Co-authored-by: atakavci <[email protected]> * revert to fixed rate health checks * fix ActiveActiveLocalFailoverTest - reduced endpoint pause to 10s to speed up test - ensure test endpoint is eavailable before startign the test (awaits up to endpoint pause time) --------- Co-authored-by: atakavci <[email protected]> * Connection.forceDisconnect should also set the connection as broken #4233 (#4258) Address: Connection.forceDisconnect should also set the connection as broken #4233 * [automatic failover] Update failover docs (#4257) * Initial draft * Cover health checks * Add javadoc's for MultiClusterClientConfig * Fix docs for health checks * Fix javadoc's * Update wordlist.txt --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> * rever hbase-formatter.xml changes - ensure hbase-formatter.xml match current one in master * reformat to comply with hbase-formater rules in master * fix errors after merge - setClusterFailoverPostProcessor renamed - format AutomaticFailoverTest * format - Endpoint.java * format - ActiveActiveLocalFailoverTest - FailbackMechanismIntegrationTest - FailbackMechanismUnitTest - PeriodicFailbackTest - StatusTrackerTest format - ActiveActiveLocalFailoverTest.java * Add failover containers for testing to local test infra * [automatic failover] Address review comments (#4264) Address review comments - replace magic numbers in tests with constats --------- Co-authored-by: ggivo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Igor Malinovskiy <[email protected]>

@a-TODO-rov

* Setup test infra for automatic failover tests * Add toxyproxy * Add test endpoints * Add example tests * [automatic-failover] Enforce formatting for automatic failover codebase (#4176) * enforce formating for mcf * enforce formating for mcf * Trigger integration test jobs feature branches * [automatic failover] Improve failover resilience by evaluating circuit breaker before retries (#4178) * fix: Change evaluation order of circuit breaker and retry mechanisms This commit modifies the order in which circuit breaker and retry mechanisms are evaluated during failover scenarios. Previously, the circuit breaker was applied after all retries were exhausted, which could delay failover in unstable network conditions. Now, the circuit breaker counts each connection error separately, even during retry attempts. * format * format * use java-8 compatible formater version * fix AutomaticFailoverTest * formating * [automatic-failover] Implement failover retry for in-flight commands (#4175) * Implement failover retry for in-flight commands This change adds support for retrying in-flight commands when a Redis connection fails and the client automatically fails over to another cluster. The feature is configurable through the FailoverOptions builder. Key changes: - Added retryFailedInflightCommands option to FailoverOptions - Implemented retry logic in CircuitBreakerCommandExecutor - Added integration tests to verify both retry and no-retry behavior - Created utility methods for test setup and configuration This enhancement improves resilience for long-running commands like blocking operations, allowing them to transparently continue on the failover cluster without client-side errors. * Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java Co-authored-by: Copilot <[email protected]> # Conflicts: # src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java * format format & clean-up * fix FailoverIntegrationTest.testInflightCommandsAreNotRetriedAfterFailover blpop timeout should be less than async command timeout to prevent completing with java.util.concurrent.TimeoutException instead of actuall command failure * Address comments from review - rename retryFailedInflightCommands->retryOnFailover - remove check for CB in OPEN state * remove unused method --------- Co-authored-by: Copilot <[email protected]> * [automatic-failover] Fix formatting for anticipated AA Failover changes (#4183) * fix formatting for anticipated failover changes * additional formatting to cover given folders * [automatic failover] Implement HealtStatusManager + weighted endpoints (#4189) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Implement add/remove endpoints (#4200) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Implement failback with given grace period (#4204) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests * - periodic failback checks - introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed * - format * - fix timing issue --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Implement wait on healthCheck results during client init (#4207) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests * - periodic failback checks - introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed * - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider - add HealthStatus.UNKNOWN as default for Cluster - handle status changes in order of events during initialization - add tests for status tracker and orderingof events - fix impacted unit&integ tests * - introduce forceActiveCluster by duration - fix formatting * - fix failing tests by waiting on clusters to get healthy * - fix failing scenario test - downgrade logback version for slf4j compatibility - increase timeouts for faultInjector * - adressing reviews and feedback * - fix formatting * - fix formatting * - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider - add test for init and post init events - fix failing tests * - replace use of reflection with helper methods - fix failing tests due to method name change * - replace 'sleep' with 'await', feedback from @a-TODO-rov --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Introduce fast failover mode - a thread-sync-free approach with builders (#4226) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests * - periodic failback checks - introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed * - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider - add HealthStatus.UNKNOWN as default for Cluster - handle status changes in order of events during initialization - add tests for status tracker and orderingof events - fix impacted unit&integ tests * - introduce forceActiveCluster by duration - fix formatting * - fix failing tests by waiting on clusters to get healthy * - fix failing scenario test - downgrade logback version for slf4j compatibility - increase timeouts for faultInjector * - adressing reviews and feedback * - fix formatting * - fix formatting * - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider - add test for init and post init events - fix failing tests * - replace use of reflection with helper methods - fix failing tests due to method name change * - introduce clusterSwitchEvent and drop clusterFailover post processor - fix broken echostrategy due to connection issue - make healtthCheckStrategy closable and close on - adding fastfailover mode to config and provider - add local failover tests for total failover duration * - introduce fastfailover using objectMaker injection into connectionFactory * - polish * - cleanup * - improve healtcheck thread visibility * - introduce TrackingConnectionPool with FailFastConnectionFactory - added builders to connection and connectionFactory - introduce initializtionTracker to track list of connections during their construction. * - return broken source as usual - do not throw exception is failover already happening * - unblock waiting threads * - failover by closing the pool * - formatting * - check waiters and active/idle connections to force disconnect * - add builder to trackingconnectionpool * - fix failing tests due to mocked ctor for trackingConnectionPool * - replace initTracker with split initializtion of conn * - refactor on builders and ctors * - undo format * - clena up * - add exceptions to logs * - add max wait duration and minConsecutiveSuccessCount to healthCheckStrategy * - replace 'sleep' with 'await', feedback from @a-TODO-rov * - fix status tracker tests --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Add tests: EchoStrategy does recover on connection error (#4230) * Add tests: EchoStrategy does recover on connection error * log error's on exception * [automatic failover] Replace 'SimpleEntry' with 'HealthCheckResult' (#4236) * - replace SimpleEntry with HealthCheckResult * - format * [automatic failover] Add rate limiter to test MultiThreadedFakeApp (#4248) * Add option for rate limiter to MultiThreadedFakeApp * retrigger checks * [automatic failover] Implement lag-aware api (#4235) * - lag-aware api * - introduce healtCheckMetaConfig - add config objects for both EchoStrategy and LagAwareStrategy - make StrategySupplier interface generic - fix impacted tests * - fix issues with docs and log * Fix EchoStrategyIntegrationTest after merge * Implement resolve bdbId based on configured HostAndPort * format & merge RedisRestAPI unit tests * introduce builder for LagAwareStrategy * configurable extended-check - provide option to enable/disable lag check - default to lag-aware check with AVAILABILITY_LAG_TOLERANCE_DEFAULT of 100ms * - fix java doc issue * - formatting * -fix doc issue * - formatting * revert introduce healtCheckMetaConfig - introduces type unsafe warnings, revert till they are resovled * Add list constructor * Add option for rate limiter to MultiThreadedFakeApp * Use /v1/local/bdbs/<int:uid>/endpoint/availability - use /v1/local/bdbs/<int:uid>/endpoint/availability instead of availability address vs /v1/bdbs/<int:uid>/availability * Add builder factory method for base HealthCheckStrategy.Config * format * format * revert to /v1/bdbs/<int:uid>/availability reading again the spec /local is inteded to be used when configuring LoadBalancers * revert to /v1/bdbs/<int:uid>/availability reading again the spec /local is inteded to be used when configuring LoadBalancers * format * fix unit test after revert of local * format * Address review comments - remove misleading default value comment - renamed Config.endpoint to Config.restEndpoint - renamed dataPathAvailability -> databaseAvailability --------- Co-authored-by: ggivo <[email protected]> * [automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself (#4244) * - set sleep duration 1 ms for forceDisconnect * - refactor cluster to replace HealthStatus with HealthCheck itself * - move Endpoint to parent package * - format * - fix typo * - adjust tests with new provider behaviour with HealthCheck * Address review comments - rename getTimeout() -> getMaxWaitFor() - remove unnecessary noHealthChecks * resolve merge conflicts * format * format --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> * [automatic failover] fix: health checks continue to run even after provider is closed. (#4252) * fix: health checks continue to run even after provider is closed. Closes #4251 * format * format & fix tests after merge format * [automatic failover] Improve AA failover scenario test (#4249) * Improve AA failover test * Fix endpoint name * Count failedCommandsAfterFailover * do not enforce formating on ActiveActiveFailoverTest --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> * [automatic failover ]fix: only notify health status listeners on actual state changes #4255 (#4256) * fix: only notify health status listeners on actual state changes #4255 - Replace overlapping health checks with sequential execution - use wasUpdated flag to prevent false notifications from timestamp conflicts. Eliminates race conditions causing spurious failovers. Closes: #4255 * format * Update src/main/java/redis/clients/jedis/mcf/HealthCheckImpl.java Co-authored-by: atakavci <[email protected]> * revert to fixed rate health checks * fix ActiveActiveLocalFailoverTest - reduced endpoint pause to 10s to speed up test - ensure test endpoint is eavailable before startign the test (awaits up to endpoint pause time) --------- Co-authored-by: atakavci <[email protected]> * Connection.forceDisconnect should also set the connection as broken #4233 (#4258) Address: Connection.forceDisconnect should also set the connection as broken #4233 * [automatic failover] Update failover docs (#4257) * Initial draft * Cover health checks * Add javadoc's for MultiClusterClientConfig * Fix docs for health checks * Fix javadoc's * Update wordlist.txt --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> * rever hbase-formatter.xml changes - ensure hbase-formatter.xml match current one in master * reformat to comply with hbase-formater rules in master * fix errors after merge - setClusterFailoverPostProcessor renamed - format AutomaticFailoverTest * format - Endpoint.java * format - ActiveActiveLocalFailoverTest - FailbackMechanismIntegrationTest - FailbackMechanismUnitTest - PeriodicFailbackTest - StatusTrackerTest format - ActiveActiveLocalFailoverTest.java * Add failover containers for testing to local test infra * Introduce Builders for classes based on UnifiedJedis * Cleanup Builders and fix minor issues * Fix formatting #1 * Improve javadocs * Enable params visibility in the CI * Remove redundant JVM flags * Guard the reflection tests Skip the reflection tests if the "with-param-names" flag is not enabled during the test run * [automatic failover] Address review comments (#4264) Address review comments - replace magic numbers in tests with constats * Add unit tests for builders * Address review comments * Add missing changes after revert * Fix error after merge - resolve cannot find symbol Error: symbol: class MultiClusterPooledConnectionProvider * Allow overriding CommandObjects in builders * Fix formatting * Pull clientConfig to the AbstractClientBuilder and remove redundant redisProtocol * Create default client config in AbstractClientBuilder Instead of creating it in createDefaultConnectionProvider() * Use builder in JedisPooled tests * Use builder in ClusterCommand tests * Fix formatting * Add missing value for maxTotalRetriesDuration * [churn] Code clean up * Use builder in SentineledConnectionProviderTest --------- Co-authored-by: ggivo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: atakavci <[email protected]>

atakavci requested review from a-TODO-rov, Copilot, ggivo and uglide August 15, 2025 11:46

atakavci self-assigned this Aug 15, 2025

atakavci added the feature label Aug 15, 2025

This comment was marked as outdated.

Sign in to view

atakavci added 6 commits August 16, 2025 14:43

- set sleep duration 1 ms for forceDisconnect

1f102a6

- refactor cluster to replace HealthStatus with HealthCheck itself

d44a309

- move Endpoint to parent package

3cbd296

- format

e5bec2e

- fix typo

f9eda0f

- adjust tests with new provider behaviour with HealthCheck

32746a3

atakavci force-pushed the ali/aa-failover-#4233-refactorCluster branch from 93bdc9a to 32746a3 Compare August 16, 2025 12:00

atakavci requested a review from Copilot August 16, 2025 14:05

This comment was marked as outdated.

Sign in to view

atakavci requested a review from Copilot August 18, 2025 08:17

Copilot AI reviewed Aug 18, 2025

View reviewed changes

src/main/java/redis/clients/jedis/mcf/TrackingConnectionPool.java Show resolved Hide resolved

src/main/java/redis/clients/jedis/mcf/HealthStatusManager.java Outdated Show resolved Hide resolved

src/main/java/redis/clients/jedis/mcf/HealthCheckImpl.java Show resolved Hide resolved

ggivo approved these changes Aug 20, 2025

View reviewed changes

ggivo added 6 commits August 20, 2025 13:28

Merge branch 'feature/automatic-failover' into ali/aa-failover-#4233-…

8315bcd

…refactorCluster

Address review comments

11b02da

- rename getTimeout() -> getMaxWaitFor() - remove unnecessary noHealthChecks

Merge branch 'feature/automatic-failover' into ali/aa-failover-#4233-…

6bfadcb

…refactorCluster

resolve merge conflicts

e661cf5

format

6f5c70d

format

300857e

ggivo merged commit 2972dfa into redis:feature/automatic-failover Aug 20, 2025
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself #4244

[automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself #4244

Uh oh!

atakavci commented Aug 15, 2025 •

edited by ggivo

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggivo left a comment

Uh oh!

Uh oh!

ggivo Aug 20, 2025

Uh oh!

atakavci Aug 20, 2025

Uh oh!

ggivo Aug 20, 2025

Uh oh!

atakavci Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[automatic failover] Refactor cluster to replace HealthStatus with HealthCheck itself #4244

[automatic failover] Refactor cluster to replace HealthStatus with HealthCheck itself #4244

Uh oh!

Conversation

atakavci commented Aug 15, 2025 • edited by ggivo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggivo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggivo Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

atakavci Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ggivo Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

atakavci Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself #4244

[automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself #4244

atakavci commented Aug 15, 2025 •

edited by ggivo

Loading