-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Purge brokers no longer reported in metadata #4557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Kafka protocol allows for brokers to have multiple host:port pairs for a given node Id, e.g. see UpdateMetadata request which contains a live_brokers list where each broker Id has a list of host:port pairs. It follows from this that the thing that uniquely identifies a broker is its Id, and not the host:port. The behaviour right now is that if we have multiple brokers with the same host:port but different Ids, the first broker in the list will be updated to have the Id of whatever broker we're looking at as we iterate through the brokers in the Metadata response in rd_kafka_parse_Metadata0(), e.g. Step 1. Broker[0] = Metadata.brokers[0] Step 2. Broker[0] = Metadata.brokers[1] Step 3. Broker[0] = Metadata.brokers[2] A typical situation where brokers have the same host:port pair but differ in their Id is if the brokers are behind a load balancer. The NODE_UPDATE mechanism responsible for this was originally added in b09ff60 ("Handle broker name and nodeid updates (issue confluentinc#343)") as a way to forcibly update a broker hostname if an Id is reused with a new host after the original one was decommissioned. But this isn't how the Java Kafka client works, so let's use the Metadata response as the source of truth instead of updating brokers if we can only match by their host:port.
Brokers that are not in the metadata should be purged from the internal client lists. This helps to avoid annoying "No route to host" and other connection failure messages. Fixes confluentinc#238.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mfleming thanks a lot for this contribution and sorry for letting it wait for long. We want to include these fixes in next version. Both fixes are good, just on broker decommission we want to do some additional checks.
Here are some comments mainly for the first fix:
|
Hi @mfleming can I apply those changes or do you want to continue the PR? Thanks! |
|
Hey sorry for the delay — yeah you can apply those changes if you have the
time. If not I’ll get to them sometime this week.
…On Tue, 21 May 2024 at 09:10, Emanuele Sabellico ***@***.***> wrote:
Hi @mfleming <https://github.com/mfleming> can I apply those changes or
do you want to continue the PR? Thanks!
—
Reply to this email directly, view it on GitHub
<#4557 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAXALUEWS5XRZGGJX6EN43ZDL6QDAVCNFSM6AAAAABASEOCUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGAZDKMRXG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Co-authored-by: Emanuele Sabellico <[email protected]>
Co-authored-by: Emanuele Sabellico <[email protected]>
rkb creation.
|
@mfleming thanks, sorry for the delay too, I'm checking it again |
as it's now set only on creation and not modified anymore
mock cluster
|
/sem-approve |
and move documentation
|
/sem-approve |
|
We're not going to merge this for 2.5.0 that is due in July as we need to do more checks on possible regressions but we want to merge it for a maintenance release in September |
Thanks for fixing things :) |
Thank you for this PR! |
|
Just wanted to say a big thank you to both of you—@mfleming for writing this and @emasab for reviewing. We just ran into a slow thread leak in a Kafka consumer at @MaterializeInc that will be fixed by this patch. |
|
While going through the PR, I found that there are a few issues in the PR.
We need to fix the above issues before releasing this PR and need of more testing. As a result, it won't be part of the upcoming 2.6 release. |
|
Closes #4881 |
|
We agreed KIP-899 and KIP-1102 aren't strictly necessary, in Java client they were disabled by default until recently. In any case the metadata response contains at least the majority of KRaft elegible brokers. If majority changes, it must contain at least one broker from previous set. There are a few things left to check, at least:
|
|
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
…sible to await for the correct list of brokers in all tests given since decommissioning brokers are excluded from that list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking test part.
return sorted broker ids
test log interceptor. Used the test log interceptor for test 0151 too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments on the test part.
|
/sem-approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as I was initially reviewing. Thanks @pranavrth and @mfleming !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing bootstrap brokers: In tests 0121 bootstrap brokers from different clusters are added to the same list, that is something that should never be done. Previously it was keeping both sets of bootstrap brokers and giving a warning. Now it's keeping only the `learned` brokers from the cluster that replies first.
|
/sem-approve |
|
/sem-approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Fix for brokers with different Ids but same host:port The Kafka protocol allows for brokers to have multiple host:port pairs for a given node Id, e.g. see UpdateMetadata request which contains a live_brokers list where each broker Id has a list of host:port pairs. It follows from this that the thing that uniquely identifies a broker is its Id, and not the host:port. The behaviour right now is that if we have multiple brokers with the same host:port but different Ids, the first broker in the list will be updated to have the Id of whatever broker we're looking at as we iterate through the brokers in the Metadata response in rd_kafka_parse_Metadata0(), e.g. Step 1. Broker[0] = Metadata.brokers[0] Step 2. Broker[0] = Metadata.brokers[1] Step 3. Broker[0] = Metadata.brokers[2] A typical situation where brokers have the same host:port pair but differ in their Id is if the brokers are behind a load balancer. The NODE_UPDATE mechanism responsible for this was originally added in b09ff60 ("Handle broker name and nodeid updates (issue #343)") as a way to forcibly update a broker hostname if an Id is reused with a new host after the original one was decommissioned. But this isn't how the Java Kafka client works, so let's use the Metadata response as the source of truth instead of updating brokers if we can only match by their host:port. * Fix for purging brokers no longer reported in metadata Brokers that are not in the metadata should be purged from the internal client lists. This helps to avoid annoying "No route to host" and other connection failure messages. Fixes #238. * Remove the possibility to modify rkb_nodeid after rkb creation. * Remove locking when accessing rkb_nodeid as it's now set only on creation and not modified anymore * Add new brokers and reassign partitions in the mock cluster * Remove bootstrap broker after receiving learned ones. Wait decommissioned threads after they've stopped instead of on termination. * Handle the _DESTROY_BROKER local error, triggered when a broker is removed without terminating the client. * Test 0151 improved with cluster replacement and cluster roll * Fix for test 0105, do_test_txn_broker_down_in_txn: remove left references when decommissioning a broker and avoid it's selected as leader again or that partitions are delegated to it * Avoid selecting a configured broker as a logical or telemetry broker * Avoid selecting terminating brokers for sending calls or new connections * Remove addressless count and avoid counting the logical broker for the all brokers down error, to send the error in all the cases * Test: verify that decommissining a broker while adding a new one with same id isn't causing problems * Handle the case where current group coordinator is decommissioned without leaving dangling references until the coordinator is changed Test 0151 fix. Given the find coordinator response adds a new broker (not a logical one, a learned one to set into `rkcg_curr_coord`) Removed brokers can be added again even if not present in metadata. This is mock cluster only problem as in a real cluster a broker that is set down cannot be a coordinator. This commit changes the coordinator before setting down a broker that is current coordinator * Remove the decommissioning broker from rk_broker_by_id when starting to avoid multiple instances are added to the list with same id. The decommissioned broker returned by the find can lead to multiple brokers with same id being added. * Don't select logical brokers at all for general purpose request like metadata ones * Schedule an immediate connection when there are no brokers connecting nor requests for connection. When we're in this state, if we respect the sparse connection interval, there's no event that notifies the awaiters at `rd_kafka_brokers_wait_state_change` given it's an interval and not a timer. This is more visible when brokers are decommissioned and there's no broker down even causing the notification. Check it with test `0113` subtest `u_multiple_subscription_changes`. * Remove all configured brokers when there are learned ones. This is to avoid leaving connections to the boostrap brokers that are continued to be used instead of the learned one, adding additional requests that can be later purged by the decommissioning of that last configured broker. * Change test 0075 after removing all bootstrap brokers. This will be reverted with KIP-899 * Remove rk_logical_broker_up_cnt * [test 0151] Simplify the test removing `await_verification`. It's possible to await for the correct list of brokers in all tests given since decommissioning brokers are excluded from that list * Remove broker state from labels * Remove `nodeid` from op * Use `rk_broker_by_id` for learned broker ids to return sorted broker ids * Verify nodename change through a test log interceptor. Used the test log interceptor for test 0151 too --------- Co-authored-by: Emanuele Sabellico <[email protected]>
Brokers that are not in the metadata should be purged from the internal client lists. This helps to avoid annoying "No route to host" and other connection failure messages.
Fixes #238