Purge brokers no longer reported in metadata #4557

mfleming · 2023-12-12T21:33:11Z

Brokers that are not in the metadata should be purged from the internal client lists. This helps to avoid annoying "No route to host" and other connection failure messages.

Fixes #238

The Kafka protocol allows for brokers to have multiple host:port pairs for a given node Id, e.g. see UpdateMetadata request which contains a live_brokers list where each broker Id has a list of host:port pairs. It follows from this that the thing that uniquely identifies a broker is its Id, and not the host:port. The behaviour right now is that if we have multiple brokers with the same host:port but different Ids, the first broker in the list will be updated to have the Id of whatever broker we're looking at as we iterate through the brokers in the Metadata response in rd_kafka_parse_Metadata0(), e.g. Step 1. Broker[0] = Metadata.brokers[0] Step 2. Broker[0] = Metadata.brokers[1] Step 3. Broker[0] = Metadata.brokers[2] A typical situation where brokers have the same host:port pair but differ in their Id is if the brokers are behind a load balancer. The NODE_UPDATE mechanism responsible for this was originally added in b09ff60 ("Handle broker name and nodeid updates (issue confluentinc#343)") as a way to forcibly update a broker hostname if an Id is reused with a new host after the original one was decommissioned. But this isn't how the Java Kafka client works, so let's use the Metadata response as the source of truth instead of updating brokers if we can only match by their host:port.

Brokers that are not in the metadata should be purged from the internal client lists. This helps to avoid annoying "No route to host" and other connection failure messages. Fixes confluentinc#238.

emasab

Hi @mfleming thanks a lot for this contribution and sorry for letting it wait for long. We want to include these fixes in next version. Both fixes are good, just on broker decommission we want to do some additional checks.

Here are some comments mainly for the first fix:

tests/0145-broker-same-host-port.c

tests/0146-purge-brokers.c

src/rdkafka_broker.c

tests/0145-broker-same-host-port.c

tests/0146-purge-brokers.c

emasab · 2024-05-21T08:10:19Z

Hi @mfleming can I apply those changes or do you want to continue the PR? Thanks!

mfleming · 2024-05-21T08:17:50Z

Hey sorry for the delay — yeah you can apply those changes if you have the time. If not I’ll get to them sometime this week.

…

On Tue, 21 May 2024 at 09:10, Emanuele Sabellico ***@***.***> wrote: Hi @mfleming <https://github.com/mfleming> can I apply those changes or do you want to continue the PR? Thanks! — Reply to this email directly, view it on GitHub <#4557 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAXALUEWS5XRZGGJX6EN43ZDL6QDAVCNFSM6AAAAABASEOCUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGAZDKMRXG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Co-authored-by: Emanuele Sabellico <[email protected]>

rkb creation.

emasab · 2024-06-07T13:20:26Z

@mfleming thanks, sorry for the delay too, I'm checking it again

as it's now set only on creation and not modified anymore

mock cluster

emasab · 2024-06-10T13:58:08Z

/sem-approve

and move documentation

emasab · 2024-06-10T14:19:03Z

/sem-approve

emasab · 2024-06-12T08:33:18Z

We're not going to merge this for 2.5.0 that is due in July as we need to do more checks on possible regressions but we want to merge it for a maintenance release in September

mfleming · 2024-06-14T13:47:09Z

We're not going to merge this for 2.5.0 that is due in July as we need to do more checks on possible regressions but we want to merge it for a maintenance release in September

Thanks for fixing things :)

emasab · 2024-06-15T11:38:00Z

Thanks for fixing things :)

Thank you for this PR!

benesch · 2024-07-01T14:57:26Z

Just wanted to say a big thank you to both of you—@mfleming for writing this and @emasab for reviewing. We just ran into a slow thread leak in a Kafka consumer at @MaterializeInc that will be fixed by this patch.

pranavrth · 2024-09-16T23:42:54Z

While going through the PR, I found that there are a few issues in the PR.

The number of threads are increased as part of this PR. Currently, all the bootstrap servers threads are updated to the real broker thread if they appear in the metadata response. This PR creates new broker threads for the new brokers which are seen in the metadata. The bootstrap servers threads are not reused for this purpose in the PR. Due to this, we generally have 'n*2' brokers threads for 'n' bootstrap servers. Earlier there were only 'n' threads.
In the current librdkafka implementation, we always keep bootstrap broker threads even if they are not reported in the metadata. This ensured that we always have bootstrap broker threads if required to rebootstrap. This PR removed this functionality.
To fix the issue introduced in point number 2, KIP-899 needs to be implemented.

We need to fix the above issues before releasing this PR and need of more testing. As a result, it won't be part of the upcoming 2.6 release.

emasab · 2024-10-22T10:11:14Z

Closes #4881

emasab · 2024-12-13T14:58:05Z

We agreed KIP-899 and KIP-1102 aren't strictly necessary, in Java client they were disabled by default until recently.
The reason is that when a change in broker set happens slow enough to be detected by the periodic (5 min) metadata refresh, the client cannot remain without brokers.

In any case the metadata response contains at least the majority of KRaft elegible brokers. If majority changes, it must contain at least one broker from previous set.
So it's usually not possible the client remains without brokers, unless the set of brokers changes to fast it cannot reach any of them and that's the case KIP-899 and KIP-1102 are addressing. But this last case is also a problem with current librdkafka code so we leave the improvement for a later PR.

There are a few things left to check, at least:

removing bootstrap brokers without id (-1) and adding brokers with real ids and advertised hostnames, to avoid duplicating the threads
check some fields in rd_kafka_s: no_idemp_brokers, rk_telemetry.preferred_broker, rk_broker_down_cnt, rk_broker_up_cnt, rk_broker_cnt, ... when brokers are removed. It could help changing the state to DOWN before removing them.
maybe there's something to change in the stats_cb, but probably nothing to do
brokers configured with rd_kafka_brokers_add must be removed after usage too
currently, decommissioned threads are still joined on destroy, is it possible to join them from time to time without accumulating them but also without blocking the main thread because of that? At specific intervals maybe, so they're already terminated when joined
check thoroughly for memory leaks and use-after-free cases

emasab · 2025-01-10T14:34:51Z

check some fields in rd_kafka_s: no_idemp_brokers, rk_telemetry.preferred_broker, rk_broker_down_cnt, rk_broker_up_cnt, rk_broker_cnt, ... when brokers are removed. It could help changing the state to DOWN before removing them.

no_idemp_brokers seems not being used,
rk_broker_cnt is decremented in rd_kafka_broker_thread_main.
rk_broker_down_cnt and rk_broker_up_cnt are changed when broker thread receives RD_KAFKA_OP_TERMINATE: it calls rd_kafka_broker_fail that sets the state to DOWN and changes those counters.
rk_telemetry.preferred_broker is also cleared on rd_kafka_broker_fail

confluent-cla-assistant · 2025-02-18T19:03:28Z

🎉 All Contributor License Agreements have been signed. Ready to merge.
✅ mfleming
_{Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.}

…sible to await for the correct list of brokers in all tests given since decommissioning brokers are excluded from that list

src/rdkafka_metadata.c

pranavrth

Checking test part.

src/rdkafka_broker.c

src/rdkafka_metadata.c

tests/0149-broker-same-host-port.c

return sorted broker ids

test log interceptor. Used the test log interceptor for test 0151 too

pranavrth

Few comments on the test part.

tests/0149-broker-same-host-port.c

src/rdkafka_mock.c

src/rdkafka_mock.h

tests/0151-purge-brokers.c

emasab · 2025-04-01T16:55:18Z

/sem-approve

emasab

Approving as I was initially reviewing. Thanks @pranavrth and @mfleming !

pranavrth

LGTM!. Great work @emasab and @mfleming. Thanks.

removing bootstrap brokers: In tests 0121 bootstrap brokers from different clusters are added to the same list, that is something that should never be done. Previously it was keeping both sets of bootstrap brokers and giving a warning. Now it's keeping only the `learned` brokers from the cluster that replies first.

emasab · 2025-04-01T17:07:56Z

/sem-approve

emasab · 2025-04-01T17:18:07Z

/sem-approve

pranavrth

LGTM! Thanks @emasab and @mfleming !!

* Fix for brokers with different Ids but same host:port The Kafka protocol allows for brokers to have multiple host:port pairs for a given node Id, e.g. see UpdateMetadata request which contains a live_brokers list where each broker Id has a list of host:port pairs. It follows from this that the thing that uniquely identifies a broker is its Id, and not the host:port. The behaviour right now is that if we have multiple brokers with the same host:port but different Ids, the first broker in the list will be updated to have the Id of whatever broker we're looking at as we iterate through the brokers in the Metadata response in rd_kafka_parse_Metadata0(), e.g. Step 1. Broker[0] = Metadata.brokers[0] Step 2. Broker[0] = Metadata.brokers[1] Step 3. Broker[0] = Metadata.brokers[2] A typical situation where brokers have the same host:port pair but differ in their Id is if the brokers are behind a load balancer. The NODE_UPDATE mechanism responsible for this was originally added in b09ff60 ("Handle broker name and nodeid updates (issue #343)") as a way to forcibly update a broker hostname if an Id is reused with a new host after the original one was decommissioned. But this isn't how the Java Kafka client works, so let's use the Metadata response as the source of truth instead of updating brokers if we can only match by their host:port. * Fix for purging brokers no longer reported in metadata Brokers that are not in the metadata should be purged from the internal client lists. This helps to avoid annoying "No route to host" and other connection failure messages. Fixes #238. * Remove the possibility to modify rkb_nodeid after rkb creation. * Remove locking when accessing rkb_nodeid as it's now set only on creation and not modified anymore * Add new brokers and reassign partitions in the mock cluster * Remove bootstrap broker after receiving learned ones. Wait decommissioned threads after they've stopped instead of on termination. * Handle the _DESTROY_BROKER local error, triggered when a broker is removed without terminating the client. * Test 0151 improved with cluster replacement and cluster roll * Fix for test 0105, do_test_txn_broker_down_in_txn: remove left references when decommissioning a broker and avoid it's selected as leader again or that partitions are delegated to it * Avoid selecting a configured broker as a logical or telemetry broker * Avoid selecting terminating brokers for sending calls or new connections * Remove addressless count and avoid counting the logical broker for the all brokers down error, to send the error in all the cases * Test: verify that decommissining a broker while adding a new one with same id isn't causing problems * Handle the case where current group coordinator is decommissioned without leaving dangling references until the coordinator is changed Test 0151 fix. Given the find coordinator response adds a new broker (not a logical one, a learned one to set into `rkcg_curr_coord`) Removed brokers can be added again even if not present in metadata. This is mock cluster only problem as in a real cluster a broker that is set down cannot be a coordinator. This commit changes the coordinator before setting down a broker that is current coordinator * Remove the decommissioning broker from rk_broker_by_id when starting to avoid multiple instances are added to the list with same id. The decommissioned broker returned by the find can lead to multiple brokers with same id being added. * Don't select logical brokers at all for general purpose request like metadata ones * Schedule an immediate connection when there are no brokers connecting nor requests for connection. When we're in this state, if we respect the sparse connection interval, there's no event that notifies the awaiters at `rd_kafka_brokers_wait_state_change` given it's an interval and not a timer. This is more visible when brokers are decommissioned and there's no broker down even causing the notification. Check it with test `0113` subtest `u_multiple_subscription_changes`. * Remove all configured brokers when there are learned ones. This is to avoid leaving connections to the boostrap brokers that are continued to be used instead of the learned one, adding additional requests that can be later purged by the decommissioning of that last configured broker. * Change test 0075 after removing all bootstrap brokers. This will be reverted with KIP-899 * Remove rk_logical_broker_up_cnt * [test 0151] Simplify the test removing `await_verification`. It's possible to await for the correct list of brokers in all tests given since decommissioning brokers are excluded from that list * Remove broker state from labels * Remove `nodeid` from op * Use `rk_broker_by_id` for learned broker ids to return sorted broker ids * Verify nodename change through a test log interceptor. Used the test log interceptor for test 0151 too --------- Co-authored-by: Emanuele Sabellico <[email protected]>

mfleming added 3 commits November 2, 2023 11:31

Fix locking

cf114ef

Fix for purging brokers no longer reported in metadata

9754ee7

Brokers that are not in the metadata should be purged from the internal client lists. This helps to avoid annoying "No route to host" and other connection failure messages. Fixes confluentinc#238.

emasab requested changes May 16, 2024

View reviewed changes

mfleming and others added 3 commits May 24, 2024 13:53

Update tests/0145-broker-same-host-port.c

ba5a3c4

Co-authored-by: Emanuele Sabellico <[email protected]>

Update tests/0146-purge-brokers.c

a735614

Co-authored-by: Emanuele Sabellico <[email protected]>

Remove the possibility to modify rkb_nodeid after

05e8950

rkb creation.

emasab added 4 commits June 7, 2024 16:26

Remove locking when accessing rkb_nodeid

df12c6a

as it's now set only on creation and not modified anymore

Add new brokers and reassign partitions in the

4365c9e

mock cluster

CHANGELOG

6253618

Automatic style fix

d16b8ce

emasab force-pushed the purge-brokers branch from 9570aa9 to d16b8ce Compare June 10, 2024 13:56

emasab requested a review from a team as a code owner June 10, 2024 13:56

Rename to rd_kafka_brokers_learned_ids

959bf16

and move documentation

Remove debug configuration in test

f6c6811

emasab mentioned this pull request Jul 3, 2024

KIP 714 with compression support #4721

Merged

emasab force-pushed the purge-brokers branch from d7b4d7e to 018d150 Compare March 28, 2025 12:04

emasab added 2 commits March 28, 2025 16:32

fixup: Address rest of comments

372525c

[test 0151] Simplify the test removing await_verification. It's pos…

8d775ef

…sible to await for the correct list of brokers in all tests given since decommissioning brokers are excluded from that list

pranavrth requested changes Mar 30, 2025

View reviewed changes

src/rdkafka_metadata.c Outdated Show resolved Hide resolved

Remove broker state from labels

bc989d2

emasab force-pushed the purge-brokers branch from 47643a9 to bc989d2 Compare March 31, 2025 10:01

pranavrth requested changes Mar 31, 2025

View reviewed changes

emasab added 4 commits March 31, 2025 15:48

Remove nodeid from op

5d27eca

Address comments

fce45ba

Use rk_broker_by_id for learned broker ids to

0dd5db0

return sorted broker ids

Verify nodename change through a

d5b2a42

test log interceptor. Used the test log interceptor for test 0151 too

emasab force-pushed the purge-brokers branch from 582ac0c to d5b2a42 Compare March 31, 2025 19:22

pranavrth requested changes Apr 1, 2025

View reviewed changes

emasab added 3 commits April 1, 2025 15:53

Address comments

9c71107

Additional documentation for the log interceptor

20a3f10

Merge branch 'master' into purge-brokers

5889efb

emasab approved these changes Apr 1, 2025

View reviewed changes

pranavrth approved these changes Apr 1, 2025

View reviewed changes

clang fix for variable definition inside switch case

cd6d98b

pranavrth approved these changes Apr 1, 2025

View reviewed changes

emasab merged commit f7c4273 into confluentinc:master Apr 1, 2025
2 checks passed

resamaraschi mentioned this pull request Apr 24, 2025

GroupCoordinator gets nodeis "-1" instead of bootstrap (>=2.10.0) #5048

Closed

This was referenced May 5, 2025

Librdkafka handling of brokers with the same host:port is inconsistent with java #4212

Closed

Unneeded connections to unavailable brokers #4881

Closed

DDDFiish mentioned this pull request Aug 25, 2025

Client connection failures during Kafka rolling upgrade when broker listener configuration changes #5176

Open

Purge brokers no longer reported in metadata #4557

Purge brokers no longer reported in metadata #4557

Uh oh!

Conversation

mfleming commented Dec 12, 2023

Uh oh!

emasab left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

emasab commented May 21, 2024

Uh oh!

mfleming commented May 21, 2024 via email

Uh oh!

emasab commented Jun 7, 2024

Uh oh!

emasab commented Jun 10, 2024

Uh oh!

emasab commented Jun 10, 2024

Uh oh!

emasab commented Jun 12, 2024

Uh oh!

mfleming commented Jun 14, 2024

Uh oh!

emasab commented Jun 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benesch commented Jul 1, 2024

Uh oh!

pranavrth commented Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emasab commented Oct 22, 2024

Uh oh!

emasab commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emasab commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

confluent-cla-assistant bot commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pranavrth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pranavrth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

emasab commented Apr 1, 2025

Uh oh!

emasab left a comment

Choose a reason for hiding this comment

Uh oh!

pranavrth left a comment

Choose a reason for hiding this comment

Uh oh!

emasab commented Apr 1, 2025

Uh oh!

emasab commented Apr 1, 2025

emasab commented Jun 15, 2024 •

edited

Loading

pranavrth commented Sep 16, 2024 •

edited

Loading

emasab commented Dec 13, 2024 •

edited

Loading

emasab commented Jan 10, 2025 •

edited

Loading

confluent-cla-assistant bot commented Feb 18, 2025 •

edited

Loading