Skip to content

Conversation

@solisamicus
Copy link
Contributor

feat: add HTTP health check API for Kubernetes integration

Add comprehensive HTTP health check support for liveness and readiness probes to enable
seamless Kubernetes deployment and container orchestration.

Background

Kubernetes requires HTTP endpoints for health probes to determine container lifecycle:

  • Liveness probes: detect if container should be restarted
  • Readiness probes: control traffic routing to healthy instances

This implementation provides a robust, extensible health check system that integrates
with dubbo-go's existing architecture while allowing users to define custom health logic.

Implementation

Core Components

  • HealthCheckConfig: YAML-configurable settings for ports, paths, and timeouts
  • HealthChecker Interface: Extensible interface for user-defined health logic
  • HealthCheckServer: HTTP server exposing /health/live and /health/ready endpoints
  • Built-in Checkers: Ready-to-use implementations for common scenarios
  • Global Registry: Thread-safe system for managing health checkers

Key Features

  • Configurable Endpoints: Customizable ports and paths via configuration
  • Extensible Logic: Users can implement HealthChecker interface for custom checks
  • Composite Support: Combine multiple health checkers with AND logic
  • Detailed Responses: JSON responses with status, timestamp, and detailed information
  • Timeout Protection: Configurable timeouts prevent hanging health checks
  • Graceful Integration: Seamless integration with dubbo-go lifecycle management
  • Thread Safety: Concurrent-safe health checker registry

Changes Made

  • Add HealthCheckConfig to MetricsConfig with port, paths, timeout settings
  • Implement HealthChecker interface for user-defined health check logic
  • Add HealthCheckServer with /health/live and /health/ready endpoints
  • Support CompositeHealthChecker for combining multiple checkers
  • Provide built-in checkers: Default, Dubbo, GracefulShutdown, Timeout
  • Integrate with metrics system for automatic server lifecycle management
  • Add thread-safe global health checker registry
  • Support detailed health results with JSON response format
  • Include graceful shutdown integration and timeout protection

Usage

Configuration

metrics:
  enable: true
  health-check:
    enabled: true
    port: "8080"
    live-path: "/health/live"
    ready-path: "/health/ready"
    timeout: "10s"

Custom Health Checker

type MyHealthChecker struct {
    db *sql.DB
}

func (m *MyHealthChecker) CheckLiveness(ctx context.Context) bool {
    return true // Process is alive
}

func (m *MyHealthChecker) CheckReadiness(ctx context.Context) bool {
    return m.db.Ping() == nil // Check database connection
}

func (m *MyHealthChecker) Name() string {
    return "MyApp"
}

// Register custom checker
server.SetHealthChecker(&MyHealthChecker{db: database})

Kubernetes Deployment

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3

API Responses

Healthy Response (200 OK)

{
  "status": "UP",
  "timestamp": 1640995200000,
  "details": {
    "check": "readiness",
    "database": "connected",
    "services": "exported"
  }
}

Unhealthy Response (503 Service Unavailable)

{
  "status": "DOWN", 
  "timestamp": 1640995200000,
  "message": "Database connection failed",
  "details": {
    "check": "readiness",
    "database": "unavailable",
    "reason": "connection_timeout"
  }
}

Benefits

  • Cloud Native: Full Kubernetes compatibility with standard probe endpoints
  • Zero Downtime: Proper readiness checks enable rolling deployments
  • Fault Tolerance: Automatic unhealthy instance removal from load balancers
  • Observability: Detailed health status information for debugging
  • Extensibility: Plugin architecture for custom health logic
  • Production Ready: Timeout protection, graceful shutdown, error handling
  • Backward Compatible: Disabled by default, no impact on existing deployments

Testing

Health check endpoints can be tested using:

curl http://localhost:8080/health/live
curl http://localhost:8080/health/ready

Fixes: #2039
Enables: Kubernetes-native health monitoring and container lifecycle management

@sonarqubecloud
Copy link

@AlexStocks AlexStocks requested a review from Copilot October 10, 2025 08:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive HTTP health check API support for Kubernetes integration, enabling liveness and readiness probes for container orchestration. The implementation provides a configurable, extensible health check system that integrates seamlessly with dubbo-go's existing architecture.

Key changes:

  • Implements configurable HTTP health endpoints (/health/live and /health/ready) for Kubernetes probes
  • Provides extensible HealthChecker interface allowing custom health logic implementation
  • Adds built-in health checkers for common scenarios (Dubbo services, graceful shutdown, timeout protection)

Reviewed Changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
server/health_checker.go Core health checker interfaces and composite implementation
server/health.go HTTP server implementation with JSON response handling
server/builtin_health_checkers.go Built-in health checker implementations for common use cases
metrics/health/health_registry.go Integration with metrics system for automatic server lifecycle
imports/imports.go Import registration for health metrics collector
config/metric_config.go Configuration structure for health check settings
common/constant/key.go Configuration keys for health check parameters
common/constant/default.go Default values for health check configuration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

result.Details["services"] = "none_exported"
} else {
result.Details["services"] = "exported"
result.Details["service_count"] = string(rune(serviceCount))
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting integer to string via rune cast is incorrect and will produce unexpected results. Use strconv.Itoa(serviceCount) instead.

Copilot uses AI. Check for mistakes.
handler := h.handler
h.mu.RUnlock()

ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 5-second timeout is hardcoded in multiple places. Consider extracting this as a configurable constant or deriving it from the health check timeout configuration.

Copilot uses AI. Check for mistakes.
handler := h.handler
h.mu.RUnlock()

ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 5-second timeout is hardcoded in multiple places. Consider extracting this as a configurable constant or deriving it from the health check timeout configuration.

Copilot uses AI. Check for mistakes.
type DefaultHealthCheckHandler struct{}

func (d *DefaultHealthCheckHandler) HandleLivenessCheck(w http.ResponseWriter, r *http.Request, result *HealthCheckResult) {
status := "UP"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to const, same with 'DOWN'

timeoutStr := h.url.GetParam(constant.HealthCheckTimeoutKey, constant.HealthCheckDefaultTimeout)
timeout, err := time.ParseDuration(timeoutStr)
if err != nil {
timeout = 10 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to return an error directly so that users can know that their configuration is wrong

Comment on lines +169 to +172
func (d *DefaultHealthChecker) CheckReadiness(ctx context.Context) bool {
// Basic readiness: assume ready
return true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default checker's survival check returns true by default, which is fine. The readiness check can do a little more, such as checking the current service registration status. you can refer https://www.bookstack.cn/read/dubbo-3.1-zh/3b6bb5d2b90f4b85.md

handler := h.handler
h.mu.RUnlock()

ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why hard-code 5 seconds

@No-SilverBullet
Copy link
Member

plz add unit test

@Alanxtl
Copy link
Contributor

Alanxtl commented Oct 19, 2025

pls fix ci fail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants