-
Notifications
You must be signed in to change notification settings - Fork 201
K8SPXC-2206 Add configurable HAProxy health check parameters #2207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
K8SPXC-2206 Add configurable HAProxy health check parameters #2207
Conversation
bc23e2f
to
cb1bef1
Compare
I'm btw happy to add this change to the Helm chart and the docs as well, but wanted to wait with those PRs until I knew for sure this would be accepted. |
Add support for configuring HAProxy backend health check parameters without requiring full configuration override via haproxy.configuration. This allows users to: - Customize health check interval (default: 10000ms) - Configure fail threshold (default: 2 consecutive failures) - Configure rise threshold (default: 1 consecutive success) - Enable automatic connection termination on backend failure Benefits: - Faster failover detection (e.g., 6s with interval: 3000, fall: 2) - Active connection cleanup when backends fail - No need to duplicate operator's HAProxy configuration - Survives operator upgrades Implementation: - Added HAProxyHealthCheckSpec to CR API - Environment variables passed to haproxy_add_pxc_nodes.sh - Backwards compatible with existing deployments - Comprehensive test coverage Example usage: ```yaml haproxy: healthCheck: interval: 3000 fall: 2 rise: 1 shutdownOnMarkDown: true ``` Fixes percona#2206
cb1bef1
to
c4cc754
Compare
@timstoop haproxy does not have |
I totally agree that it makes sense to have it as default, it's what we were planning on doing in our Helm default code anyway. I only made it togglable in case there were good reasons that I didn't see and I really wanted to have this change implemented :-) I totally missed the 2205 PR, no idea how that could happen. |
- Add -r flag to read command to prevent backslash mangling - Add quotes around variable in echo command - Replace x-prefix comparison with direct comparison - Replace -a operator with && for better POSIX compliance
Add HA_SERVER_OPTIONS environment variable to all HAProxy statefulset compare files for e2e tests. This env var was added in the main feature and needs to be present in the expected test outputs. Tests affected: - haproxy - upgrade-haproxy - proxy-protocol - monitoring-2-0 - monitoring-pmm3 - default-cr
HA_SERVER_OPTIONS should come before REPLICAS_SVC_ONLY_READERS in the env list.
yes, please |
firs_node_replica='' | ||
main_node='' | ||
|
||
SERVER_OPTIONS=${HA_SERVER_OPTIONS:-'resolvers kubernetes check inter 10000 rise 1 fall 2 weight 1'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if I want to set custom HA_SERVER_OPTIONS via a secret? Will it work with your changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that would break. Fixed in my latest commit!
As requested in PR review, this feature is being added separately in PR percona#2205. Also fixed HA_SERVER_OPTIONS to allow custom values via secret: - When healthCheck is not configured in the CR, HA_SERVER_OPTIONS env var is not set by the operator, allowing it to be customized via the haproxy-env-vars secret or use the script's default value - When healthCheck is configured, the operator generates HA_SERVER_OPTIONS with the specified interval, rise, and fall values Changes: - Removed ShutdownOnMarkDown field from HAProxyHealthCheckSpec API - Removed HA_SHUTDOWN_ON_MARK_DOWN env var and shutdown_on_mark_down logic - Updated CRDs and generated code - Updated tests - Updated example CR documentation
Removed on marked down shutdown sessions option, as requested. Also fixed the handling of HA_SERVER_OPTIONS via a secret. |
The code generates env vars in this order: 1. PXC_SERVICE 2. HA_SERVER_OPTIONS (from buildHAProxyHealthCheckEnvVars) 3. IS_PROXY_PROTOCOL (added later if proxy protocol is configured) 4. REPLICAS_SVC_ONLY_READERS Updated compare files to match the generated order.
commit: f8e092d |
CHANGE DESCRIPTION
Problem:
When a PXC node fails (e.g., during rolling restart), HAProxy takes 20+ seconds to detect the failure with default settings (
check inter 10000
+fall 2
= 20s). Worse, existing client connections to the failed backend are NOT terminated, causing them to hang until TCP timeout (potentially minutes).The only workaround is to provide a complete HAProxy configuration via
haproxy.configuration
, which duplicates operator logic, breaks on upgrades, and is difficult to maintain.Cause:
HAProxy backend health check parameters (
interval
,fall
,rise
,on-marked-down shutdown-sessions
) are hardcoded in the operator and not exposed through the CR API.Solution:
Add a new
healthCheck
field to the HAProxy spec allowing granular control:This enables fast failover (6s vs 20s) and active connection cleanup without overriding the entire configuration.
CHECKLIST
Jira
Needs Doc
) and QA (Needs QA
)?Tests
compare/*-oc.yml
)?Config/Logging/Testability
Additional Notes:
Fixes #2206