Skip to content

[automatic failover] Tracking Issue: HealthCheckStatus Incorrectly Overwritten Leading to False State Transitions #4255

@ggivo

Description

@ggivo

Issue Summary

The safeUpdate() method in HealthCheckImpl has a bug where it notifies listeners of status changes even when the actual stored state is not updated, leading to false failover triggers and HEALTHY/UNHEALTHY state toggling.

Root Cause Analysis

Bug

  • Scenario: An older health check result arrives after a newer one has already been stored
    • Expected Behavior: State should NOT be updated (timestamp check prevents this )
    • Listeners should NOT be notified (no actual state change occurred)
  • Actual Behavior:
    • State is correctly NOT updated
    • But listeners are incorrectly notified because oldResult.getStatus() != status compares the stored status against the rejected input status

Example Scenario

Time: T1 - Health check starts (timestamp: 1000)
Time: T2 - Health check starts (timestamp: 2000)  
Time: T3 - T2 completes first: HEALTHY → stored (timestamp: 2000)
Time: T4 - T1 completes later: UNHEALTHY → rejected (older timestamp)
         - But listeners notified: HEALTHY → UNHEALTHY (FALSE!)
         - Actual state remains: HEALTHY

Problematic Code:

private void safeUpdate(long owner, HealthStatus status) {
    HealthCheckResult newResult = new HealthCheckResult(owner, status);
    HealthCheckResult oldResult = resultRef.getAndUpdate(current -> {
        if (current.getTimestamp() < owner) {
            return newResult;  // State updated
        }
        return current;        // State NOT updated
    });
    if (oldResult.getStatus() != status) {  // : Always compares against input status
        // notify listeners
        notifyListeners(oldResult.getStatus(), status);  // False notification!
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions