Skip to content

Conversation

@atakavci atakavci self-assigned this Aug 15, 2025

This comment was marked as outdated.

@atakavci atakavci force-pushed the ali/aa-failover-#4233-refactorCluster branch from 93bdc9a to 32746a3 Compare August 16, 2025 12:00
@atakavci atakavci requested a review from Copilot August 16, 2025 14:05

This comment was marked as outdated.

@atakavci atakavci requested a review from Copilot August 18, 2025 08:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the cluster management system to replace HealthStatus with HealthCheck objects directly, simplifying the health monitoring architecture. The change moves the Endpoint interface from the mcf package to the main jedis package and introduces a new interface-based design for health checks.

Key changes:

  • Replaced HealthStatus tracking with direct HealthCheck object references in cluster management
  • Moved Endpoint interface from redis.clients.jedis.mcf to redis.clients.jedis package
  • Converted HealthCheck from a concrete class to an interface with HealthCheckImpl as the implementation

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
MultiClusterPooledConnectionProvider.java Updated cluster construction to store HealthCheck objects directly instead of tracking HealthStatus separately
HealthCheck.java Converted from concrete class to interface defining health check contract
HealthCheckImpl.java New implementation class containing the original HealthCheck logic
HealthStatusManager.java Modified to return HealthCheck objects and use new implementation class
Endpoint.java Moved from mcf package to main jedis package
Test files Updated imports and test logic to work with new architecture
Comments suppressed due to low confidence (1)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock follow-up PRs. LGTM overall, left some comments to address later.

}
}

public long getTimeout() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confusing — I would expect this to be the timeout for a single HealthCheck, but instead it represents the maximum time to wait for an Endpoint to reach a status during initialization.

Currently it’s exposed via HealthStatusManager.getMaxWaitFor(Endpoint).
How about calculating this directly there instead?
(We’d likely need to extend the interface to provide a reference to the strategy used for creating the HealthCheck.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually it is suboptimal for status manager to contain that logic.
We can just rename it back to something like getMaxWaitFor or getMaxResultDuration, etc..
i agree with "confusing" part.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think this method should not be part of HealthCheck. It uses only info from strategy it was build from
Let's rename it for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well,, the source of info is one thing,, while the logic/implementation that relies on that info is another.
So having all info provided from strategy is not the only argument to come to decision.
i think this is something to discuss with less priority at another time.

@ggivo ggivo merged commit 2972dfa into redis:feature/automatic-failover Aug 20, 2025
8 of 9 checks passed
ggivo added a commit that referenced this pull request Sep 1, 2025
…4228)

* Setup test infra for automatic failover tests
  * Add toxyproxy
  * Add test endpoints
  * Add example tests

* [automatic-failover] Enforce formatting for automatic failover codebase (#4176)

* enforce formating for mcf

* enforce formating for mcf

* Trigger integration test jobs  feature branches

* [automatic failover] Improve failover resilience by evaluating circuit breaker before retries (#4178)

* fix: Change evaluation order of circuit breaker and retry mechanisms

This commit modifies the order in which circuit breaker and retry mechanisms
are evaluated during failover scenarios. Previously, the circuit breaker was
applied after all retries were exhausted, which could delay failover in
unstable network conditions. Now, the circuit breaker counts each connection
error separately, even during retry attempts.

* format

* format

* use java-8 compatible formater version

* fix AutomaticFailoverTest

* formating

* [automatic-failover] Implement failover retry for in-flight commands (#4175)

* Implement failover retry for in-flight commands

This change adds support for retrying in-flight commands when a Redis connection
fails and the client automatically fails over to another cluster. The feature
is configurable through the FailoverOptions builder.

Key changes:
- Added retryFailedInflightCommands option to FailoverOptions
- Implemented retry logic in CircuitBreakerCommandExecutor
- Added integration tests to verify both retry and no-retry behavior
- Created utility methods for test setup and configuration

This enhancement improves resilience for long-running commands like blocking
operations, allowing them to transparently continue on the failover cluster
without client-side errors.

* Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java

Co-authored-by: Copilot <[email protected]>
# Conflicts:
#	src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java

* format

format & clean-up

* fix FailoverIntegrationTest.testInflightCommandsAreNotRetriedAfterFailover

    blpop timeout should be less than async command timeout to prevent completing with java.util.concurrent.TimeoutException instead of actuall command failure

* Address comments from review
   - rename retryFailedInflightCommands->retryOnFailover
   - remove check for CB in OPEN state

* remove unused method

---------

Co-authored-by: Copilot <[email protected]>

* [automatic-failover] Fix formatting for anticipated AA Failover changes (#4183)

* fix formatting for anticipated failover changes

* additional formatting to cover given folders

* [automatic failover] Implement HealtStatusManager + weighted endpoints (#4189)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Implement add/remove endpoints (#4200)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

* - add remove endpoints

* - replace cluster disabled with failbackCandidate
- replace failback enabled with failbacksupported in client
- fix formatting
- set defaults

* - remove failback candidate
- fix failing tests

* - fix remove logic
- fix failing tests

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Implement failback with given grace period (#4204)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

* - add remove endpoints

* - replace cluster disabled with failbackCandidate
- replace failback enabled with failbacksupported in client
- fix formatting
- set defaults

* - remove failback candidate
- fix failing tests

* - fix remove logic
- fix failing tests

* - periodic failback checks
- introduce graceperiod
- fix issue when CB is forced_open and gracePeriod is completed

* - format

* - fix timing issue

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Implement wait on healthCheck results during client init (#4207)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

* - add remove endpoints

* - replace cluster disabled with failbackCandidate
- replace failback enabled with failbacksupported in client
- fix formatting
- set defaults

* - remove failback candidate
- fix failing tests

* - fix remove logic
- fix failing tests

* - periodic failback checks
- introduce graceperiod
- fix issue when CB is forced_open and gracePeriod is completed

* - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider
- add HealthStatus.UNKNOWN as default for Cluster
- handle status changes in order of events during initialization
- add tests for status tracker and orderingof events
- fix impacted unit&integ tests

* - introduce forceActiveCluster by duration
- fix formatting

* - fix failing tests by waiting on clusters to get healthy

* - fix failing scenario test
- downgrade logback version for slf4j compatibility
- increase timeouts for faultInjector

* - adressing reviews and feedback

* - fix formatting

* - fix formatting

* - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider
- add test for init and post init events
- fix failing tests

* - replace use of reflection with helper methods
- fix failing tests due to method name change

* - replace  'sleep' with 'await', feedback from @a-TODO-rov

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Introduce fast failover mode - a thread-sync-free approach with builders (#4226)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

* - add remove endpoints

* - replace cluster disabled with failbackCandidate
- replace failback enabled with failbacksupported in client
- fix formatting
- set defaults

* - remove failback candidate
- fix failing tests

* - fix remove logic
- fix failing tests

* - periodic failback checks
- introduce graceperiod
- fix issue when CB is forced_open and gracePeriod is completed

* - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider
- add HealthStatus.UNKNOWN as default for Cluster
- handle status changes in order of events during initialization
- add tests for status tracker and orderingof events
- fix impacted unit&integ tests

* - introduce forceActiveCluster by duration
- fix formatting

* - fix failing tests by waiting on clusters to get healthy

* - fix failing scenario test
- downgrade logback version for slf4j compatibility
- increase timeouts for faultInjector

* - adressing reviews and feedback

* - fix formatting

* - fix formatting

* - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider
- add test for init and post init events
- fix failing tests

* - replace use of reflection with helper methods
- fix failing tests due to method name change

* - introduce clusterSwitchEvent and drop clusterFailover post processor
- fix broken echostrategy due to connection issue
- make healtthCheckStrategy closable and close on
- adding fastfailover mode to config and provider
- add local failover tests for total failover duration

* - introduce fastfailover using  objectMaker injection into connectionFactory

* - polish

* - cleanup

* - improve healtcheck thread visibility

* - introduce TrackingConnectionPool with FailFastConnectionFactory
- added builders to connection and connectionFactory
- introduce initializtionTracker to track list of connections during their construction.

* - return broken source as usual
- do not throw exception is failover already happening

* - unblock waiting threads

* - failover by closing the pool

* - formatting

* - check waiters and active/idle connections to force disconnect

* - add builder to trackingconnectionpool

* - fix failing tests due to mocked ctor for trackingConnectionPool

* - replace initTracker with split initializtion of conn

* - refactor on builders and ctors

* - undo format

* - clena up

* - add exceptions to logs

* - add max wait duration and minConsecutiveSuccessCount to healthCheckStrategy

* - replace  'sleep' with 'await', feedback from @a-TODO-rov

* - fix status tracker tests

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Add tests: EchoStrategy does recover on connection error (#4230)

* Add tests: EchoStrategy does recover on connection error

* log error's on exception

* [automatic failover] Replace 'SimpleEntry' with 'HealthCheckResult' (#4236)

* - replace SimpleEntry with HealthCheckResult

* - format

* [automatic failover] Add rate limiter to test MultiThreadedFakeApp (#4248)

* Add option for rate limiter to MultiThreadedFakeApp

* retrigger checks

* [automatic failover] Implement lag-aware api (#4235)

* - lag-aware api

* - introduce healtCheckMetaConfig
- add config objects for both EchoStrategy and LagAwareStrategy
- make StrategySupplier interface generic
- fix impacted tests

* - fix issues with docs and log

* Fix EchoStrategyIntegrationTest after merge

* Implement resolve bdbId based on configured HostAndPort

* format & merge RedisRestAPI unit tests

* introduce builder for LagAwareStrategy

* configurable extended-check

   - provide option to enable/disable lag check
   - default to lag-aware check with AVAILABILITY_LAG_TOLERANCE_DEFAULT of 100ms

* - fix java doc issue

* - formatting

* -fix doc issue

* - formatting

* revert introduce healtCheckMetaConfig

    - introduces type unsafe warnings, revert till they are resovled

* Add list constructor

* Add option for rate limiter to MultiThreadedFakeApp

* Use /v1/local/bdbs/<int:uid>/endpoint/availability

   - use /v1/local/bdbs/<int:uid>/endpoint/availability instead of  availability address vs /v1/bdbs/<int:uid>/availability

* Add builder factory method for base HealthCheckStrategy.Config

* format

* format

* revert to /v1/bdbs/<int:uid>/availability

  reading again the spec /local is inteded to be used when configuring LoadBalancers

* revert to /v1/bdbs/<int:uid>/availability

  reading again the spec /local is inteded to be used when configuring LoadBalancers

* format

* fix unit test after revert of local

* format

* Address review comments
  - remove misleading default value comment
  - renamed Config.endpoint to Config.restEndpoint
  - renamed dataPathAvailability -> databaseAvailability

---------

Co-authored-by: ggivo <[email protected]>

* [automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself (#4244)

* - set sleep duration 1 ms for forceDisconnect

* - refactor cluster to replace HealthStatus with HealthCheck itself

* - move Endpoint to parent package

* - format

* - fix typo

* - adjust tests with new provider behaviour with HealthCheck

* Address review comments
   - rename getTimeout() -> getMaxWaitFor()
   - remove unnecessary noHealthChecks

* resolve merge conflicts

* format

* format

---------

Co-authored-by: Ivo Gaydazhiev <[email protected]>

* [automatic failover] fix: health checks continue to run even after provider is closed. (#4252)

* fix: health checks continue to run even after provider is closed.

Closes #4251

* format

* format & fix tests after merge

format

* [automatic failover] Improve AA failover scenario test (#4249)

* Improve AA failover test

* Fix endpoint name

* Count failedCommandsAfterFailover

* do not enforce formating on ActiveActiveFailoverTest

---------

Co-authored-by: Ivo Gaydazhiev <[email protected]>

* [automatic failover ]fix: only notify health status listeners on actual state changes #4255 (#4256)

* fix: only notify health status listeners on actual state changes #4255

  - Replace overlapping health checks with sequential execution
  - use wasUpdated flag to prevent false notifications from timestamp conflicts.
    Eliminates race conditions causing spurious failovers.

Closes: #4255

* format

* Update src/main/java/redis/clients/jedis/mcf/HealthCheckImpl.java

Co-authored-by: atakavci <[email protected]>

* revert to fixed rate health checks

* fix ActiveActiveLocalFailoverTest

  - reduced endpoint pause to 10s to speed up test
  - ensure test endpoint is eavailable before startign the test (awaits up to endpoint pause time)

---------

Co-authored-by: atakavci <[email protected]>

* Connection.forceDisconnect should also set the connection as broken #4233 (#4258)

Address: Connection.forceDisconnect should also set the connection as broken #4233

* [automatic failover] Update failover docs (#4257)

* Initial draft

* Cover health checks

* Add javadoc's for MultiClusterClientConfig

* Fix docs for health checks

* Fix javadoc's

* Update wordlist.txt

---------

Co-authored-by: Ivo Gaydazhiev <[email protected]>

* rever hbase-formatter.xml changes

   - ensure hbase-formatter.xml match current one in master

* reformat to comply with hbase-formater rules in master

* fix errors after merge

   - setClusterFailoverPostProcessor renamed
   - format AutomaticFailoverTest

* format
    - Endpoint.java

* format

    - ActiveActiveLocalFailoverTest
    - FailbackMechanismIntegrationTest
    - FailbackMechanismUnitTest
    - PeriodicFailbackTest
    - StatusTrackerTest

format

    - ActiveActiveLocalFailoverTest.java

* Add failover containers for testing to local test infra

* [automatic failover] Address review comments (#4264)

Address review comments

  - replace magic numbers in tests with constats

---------

Co-authored-by: ggivo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Igor Malinovskiy <[email protected]>
ggivo added a commit that referenced this pull request Oct 2, 2025
* Setup test infra for automatic failover tests
  * Add toxyproxy
  * Add test endpoints
  * Add example tests

* [automatic-failover] Enforce formatting for automatic failover codebase (#4176)

* enforce formating for mcf

* enforce formating for mcf

* Trigger integration test jobs  feature branches

* [automatic failover] Improve failover resilience by evaluating circuit breaker before retries (#4178)

* fix: Change evaluation order of circuit breaker and retry mechanisms

This commit modifies the order in which circuit breaker and retry mechanisms
are evaluated during failover scenarios. Previously, the circuit breaker was
applied after all retries were exhausted, which could delay failover in
unstable network conditions. Now, the circuit breaker counts each connection
error separately, even during retry attempts.

* format

* format

* use java-8 compatible formater version

* fix AutomaticFailoverTest

* formating

* [automatic-failover] Implement failover retry for in-flight commands (#4175)

* Implement failover retry for in-flight commands

This change adds support for retrying in-flight commands when a Redis connection
fails and the client automatically fails over to another cluster. The feature
is configurable through the FailoverOptions builder.

Key changes:
- Added retryFailedInflightCommands option to FailoverOptions
- Implemented retry logic in CircuitBreakerCommandExecutor
- Added integration tests to verify both retry and no-retry behavior
- Created utility methods for test setup and configuration

This enhancement improves resilience for long-running commands like blocking
operations, allowing them to transparently continue on the failover cluster
without client-side errors.

* Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java

Co-authored-by: Copilot <[email protected]>
# Conflicts:
#	src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java

* format

format & clean-up

* fix FailoverIntegrationTest.testInflightCommandsAreNotRetriedAfterFailover

    blpop timeout should be less than async command timeout to prevent completing with java.util.concurrent.TimeoutException instead of actuall command failure

* Address comments from review
   - rename retryFailedInflightCommands->retryOnFailover
   - remove check for CB in OPEN state

* remove unused method

---------

Co-authored-by: Copilot <[email protected]>

* [automatic-failover] Fix formatting for anticipated AA Failover changes (#4183)

* fix formatting for anticipated failover changes

* additional formatting to cover given folders

* [automatic failover] Implement HealtStatusManager + weighted endpoints (#4189)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Implement add/remove endpoints (#4200)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

* - add remove endpoints

* - replace cluster disabled with failbackCandidate
- replace failback enabled with failbacksupported in client
- fix formatting
- set defaults

* - remove failback candidate
- fix failing tests

* - fix remove logic
- fix failing tests

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Implement failback with given grace period (#4204)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

* - add remove endpoints

* - replace cluster disabled with failbackCandidate
- replace failback enabled with failbacksupported in client
- fix formatting
- set defaults

* - remove failback candidate
- fix failing tests

* - fix remove logic
- fix failing tests

* - periodic failback checks
- introduce graceperiod
- fix issue when CB is forced_open and gracePeriod is completed

* - format

* - fix timing issue

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Implement wait on healthCheck results during client init (#4207)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

* - add remove endpoints

* - replace cluster disabled with failbackCandidate
- replace failback enabled with failbacksupported in client
- fix formatting
- set defaults

* - remove failback candidate
- fix failing tests

* - fix remove logic
- fix failing tests

* - periodic failback checks
- introduce graceperiod
- fix issue when CB is forced_open and gracePeriod is completed

* - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider
- add HealthStatus.UNKNOWN as default for Cluster
- handle status changes in order of events during initialization
- add tests for status tracker and orderingof events
- fix impacted unit&integ tests

* - introduce forceActiveCluster by duration
- fix formatting

* - fix failing tests by waiting on clusters to get healthy

* - fix failing scenario test
- downgrade logback version for slf4j compatibility
- increase timeouts for faultInjector

* - adressing reviews and feedback

* - fix formatting

* - fix formatting

* - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider
- add test for init and post init events
- fix failing tests

* - replace use of reflection with helper methods
- fix failing tests due to method name change

* - replace  'sleep' with 'await', feedback from @a-TODO-rov

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Introduce fast failover mode - a thread-sync-free approach with builders (#4226)

* - weighted cluster seleciton
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters

* - add builder for ClusterConfig
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check

* - fix naming

* clean up and mark override methods

* fix link in javadoc

* fix formatting

* - fix double registered listeners in healtstatusmgr
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests

* Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

Co-authored-by: Copilot <[email protected]>

* - add remove endpoints

* - replace cluster disabled with failbackCandidate
- replace failback enabled with failbacksupported in client
- fix formatting
- set defaults

* - remove failback candidate
- fix failing tests

* - fix remove logic
- fix failing tests

* - periodic failback checks
- introduce graceperiod
- fix issue when CB is forced_open and gracePeriod is completed

* - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider
- add HealthStatus.UNKNOWN as default for Cluster
- handle status changes in order of events during initialization
- add tests for status tracker and orderingof events
- fix impacted unit&integ tests

* - introduce forceActiveCluster by duration
- fix formatting

* - fix failing tests by waiting on clusters to get healthy

* - fix failing scenario test
- downgrade logback version for slf4j compatibility
- increase timeouts for faultInjector

* - adressing reviews and feedback

* - fix formatting

* - fix formatting

* - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider
- add test for init and post init events
- fix failing tests

* - replace use of reflection with helper methods
- fix failing tests due to method name change

* - introduce clusterSwitchEvent and drop clusterFailover post processor
- fix broken echostrategy due to connection issue
- make healtthCheckStrategy closable and close on
- adding fastfailover mode to config and provider
- add local failover tests for total failover duration

* - introduce fastfailover using  objectMaker injection into connectionFactory

* - polish

* - cleanup

* - improve healtcheck thread visibility

* - introduce TrackingConnectionPool with FailFastConnectionFactory
- added builders to connection and connectionFactory
- introduce initializtionTracker to track list of connections during their construction.

* - return broken source as usual
- do not throw exception is failover already happening

* - unblock waiting threads

* - failover by closing the pool

* - formatting

* - check waiters and active/idle connections to force disconnect

* - add builder to trackingconnectionpool

* - fix failing tests due to mocked ctor for trackingConnectionPool

* - replace initTracker with split initializtion of conn

* - refactor on builders and ctors

* - undo format

* - clena up

* - add exceptions to logs

* - add max wait duration and minConsecutiveSuccessCount to healthCheckStrategy

* - replace  'sleep' with 'await', feedback from @a-TODO-rov

* - fix status tracker tests

---------

Co-authored-by: Copilot <[email protected]>

* [automatic failover] Add tests: EchoStrategy does recover on connection error (#4230)

* Add tests: EchoStrategy does recover on connection error

* log error's on exception

* [automatic failover] Replace 'SimpleEntry' with 'HealthCheckResult' (#4236)

* - replace SimpleEntry with HealthCheckResult

* - format

* [automatic failover] Add rate limiter to test MultiThreadedFakeApp (#4248)

* Add option for rate limiter to MultiThreadedFakeApp

* retrigger checks

* [automatic failover] Implement lag-aware api (#4235)

* - lag-aware api

* - introduce healtCheckMetaConfig
- add config objects for both EchoStrategy and LagAwareStrategy
- make StrategySupplier interface generic
- fix impacted tests

* - fix issues with docs and log

* Fix EchoStrategyIntegrationTest after merge

* Implement resolve bdbId based on configured HostAndPort

* format & merge RedisRestAPI unit tests

* introduce builder for LagAwareStrategy

* configurable extended-check

   - provide option to enable/disable lag check
   - default to lag-aware check with AVAILABILITY_LAG_TOLERANCE_DEFAULT of 100ms

* - fix java doc issue

* - formatting

* -fix doc issue

* - formatting

* revert introduce healtCheckMetaConfig

    - introduces type unsafe warnings, revert till they are resovled

* Add list constructor

* Add option for rate limiter to MultiThreadedFakeApp

* Use /v1/local/bdbs/<int:uid>/endpoint/availability

   - use /v1/local/bdbs/<int:uid>/endpoint/availability instead of  availability address vs /v1/bdbs/<int:uid>/availability

* Add builder factory method for base HealthCheckStrategy.Config

* format

* format

* revert to /v1/bdbs/<int:uid>/availability

  reading again the spec /local is inteded to be used when configuring LoadBalancers

* revert to /v1/bdbs/<int:uid>/availability

  reading again the spec /local is inteded to be used when configuring LoadBalancers

* format

* fix unit test after revert of local

* format

* Address review comments
  - remove misleading default value comment
  - renamed Config.endpoint to Config.restEndpoint
  - renamed dataPathAvailability -> databaseAvailability

---------

Co-authored-by: ggivo <[email protected]>

* [automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself (#4244)

* - set sleep duration 1 ms for forceDisconnect

* - refactor cluster to replace HealthStatus with HealthCheck itself

* - move Endpoint to parent package

* - format

* - fix typo

* - adjust tests with new provider behaviour with HealthCheck

* Address review comments
   - rename getTimeout() -> getMaxWaitFor()
   - remove unnecessary noHealthChecks

* resolve merge conflicts

* format

* format

---------

Co-authored-by: Ivo Gaydazhiev <[email protected]>

* [automatic failover] fix: health checks continue to run even after provider is closed. (#4252)

* fix: health checks continue to run even after provider is closed.

Closes #4251

* format

* format & fix tests after merge

format

* [automatic failover] Improve AA failover scenario test (#4249)

* Improve AA failover test

* Fix endpoint name

* Count failedCommandsAfterFailover

* do not enforce formating on ActiveActiveFailoverTest

---------

Co-authored-by: Ivo Gaydazhiev <[email protected]>

* [automatic failover ]fix: only notify health status listeners on actual state changes #4255 (#4256)

* fix: only notify health status listeners on actual state changes #4255

  - Replace overlapping health checks with sequential execution
  - use wasUpdated flag to prevent false notifications from timestamp conflicts.
    Eliminates race conditions causing spurious failovers.

Closes: #4255

* format

* Update src/main/java/redis/clients/jedis/mcf/HealthCheckImpl.java

Co-authored-by: atakavci <[email protected]>

* revert to fixed rate health checks

* fix ActiveActiveLocalFailoverTest

  - reduced endpoint pause to 10s to speed up test
  - ensure test endpoint is eavailable before startign the test (awaits up to endpoint pause time)

---------

Co-authored-by: atakavci <[email protected]>

* Connection.forceDisconnect should also set the connection as broken #4233 (#4258)

Address: Connection.forceDisconnect should also set the connection as broken #4233

* [automatic failover] Update failover docs (#4257)

* Initial draft

* Cover health checks

* Add javadoc's for MultiClusterClientConfig

* Fix docs for health checks

* Fix javadoc's

* Update wordlist.txt

---------

Co-authored-by: Ivo Gaydazhiev <[email protected]>

* rever hbase-formatter.xml changes

   - ensure hbase-formatter.xml match current one in master

* reformat to comply with hbase-formater rules in master

* fix errors after merge

   - setClusterFailoverPostProcessor renamed
   - format AutomaticFailoverTest

* format
    - Endpoint.java

* format

    - ActiveActiveLocalFailoverTest
    - FailbackMechanismIntegrationTest
    - FailbackMechanismUnitTest
    - PeriodicFailbackTest
    - StatusTrackerTest

format

    - ActiveActiveLocalFailoverTest.java

* Add failover containers for testing to local test infra

* Introduce Builders for classes based on UnifiedJedis

* Cleanup Builders and fix minor issues

* Fix formatting #1

* Improve javadocs

* Enable params visibility in the CI

* Remove redundant JVM flags

* Guard the reflection tests

Skip the reflection tests if the "with-param-names" flag is not enabled during the test run

* [automatic failover] Address review comments (#4264)

Address review comments

  - replace magic numbers in tests with constats

* Add unit tests for builders

* Address review comments

* Add missing changes after revert

* Fix error after merge
  -  resolve cannot find symbol
             Error:    symbol:   class MultiClusterPooledConnectionProvider

* Allow overriding CommandObjects in builders

* Fix formatting

* Pull clientConfig to the AbstractClientBuilder and remove redundant redisProtocol

* Create default client config in AbstractClientBuilder

Instead of creating it in createDefaultConnectionProvider()

* Use builder in JedisPooled tests

* Use builder in ClusterCommand tests

* Fix formatting

* Add missing value for maxTotalRetriesDuration

* [churn] Code clean up

* Use builder in SentineledConnectionProviderTest

---------

Co-authored-by: ggivo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: atakavci <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants