Client connection failures during Kafka rolling upgrade when broker listener configuration changes

## Description

During a rolling upgrade of a multi-node Kafka cluster, we change the broker listener configuration through several steps, and restart brokers one by one. After the final listener configuration is applied and brokers are restarted, we observe that Kafka clients using librdkafka experience connection failures until the client process is restarted.

## Environment

- Kafka version: 3.9.0
- librdkafka version: below 2.10.0
## Upgrade Steps

We apply the following configuration changes step by step, restarting brokers after each change:

1. 
    ```
    listener.security.protocol.map=BROKER:SASL_PLAINTEXT,CONTROLLER:SASL_PLAINTEXT,SASL_PLAINTEXT:SASL_PLAINTEXT
    listeners=SASL_PLAINTEXT://<hostname>:9092,BROKER://<hostname>:9094
    inter.broker.listener.name=SASL_PLAINTEXT
    ```
2. 
    ```
    listener.security.protocol.map=BROKER:SASL_PLAINTEXT,CONTROLLER:SASL_PLAINTEXT,SASL_PLAINTEXT:SASL_PLAINTEXT
    listeners=SASL_PLAINTEXT://<hostname>:9092,BROKER://<hostname>:9094
    inter.broker.listener.name=BROKER
    ```
3. 
    ```
    listener.security.protocol.map=BROKER:SASL_PLAINTEXT,CONTROLLER:SASL_PLAINTEXT,SASL_PLAINTEXT:SASL_PLAINTEXT
    listeners=BROKER://<hostname>:9092,SASL_PLAINTEXT://<hostname>:9094
    inter.broker.listener.name=BROKER
    ```
4. 
    ```
    listener.security.protocol.map=BROKER:SASL_PLAINTEXT,CONTROLLER:SASL_PLAINTEXT,SASL_PLAINTEXT:SASL_PLAINTEXT
    listeners=BROKER://<hostname>:9092
    inter.broker.listener.name=BROKER
    ```

After step 4, when a broker is restarted, clients start reporting connection errors such as:

```
BrokerTransportFailure (Local: Broker transport failure): sasl_plaintext://khazad13:9092/167843919: Connection setup timed out in state CONNECT (after 30029ms in state CONNECT, 1 identical error(s) suppressed)
Connect to ipv4#[10.1.24.76:9094] failed: Connection refused (after 0ms in state CONNECT, 6 identical error(s) suppressed)
```
The issue appears to be that the listener name has changed, but the Kafka client is still trying to connect to the previous node (as shown in the example, attempting to connect to khazad13:9094, when this server no longer exists).
The errors persist until the client process itself is restarted, after which everything works fine.

## Observations

- The issue occurs with older versions of librdkafka (e.g. 2.8.0).
- With librdkafka 2.10.0, the issue still happens during the upgrade, but after all brokers are restarted, the clients recover without requiring a restart.
- According to the changelog, there are fixes related to broker identification and removal of unavailable brokers (#4557, #4970). Is this behavior expected? Has the issue been fully resolved in 2.10.0 or later?

## Questions

- Is this client-side connection failure expected during rolling upgrade with listener changes?
- Is a client restart the only workaround for older librdkafka versions?
- Is this issue considered resolved in recent librdkafka versions, or are there recommended best practices for Kafka upgrades involving listener changes?

Thanks in advance for your help!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Client connection failures during Kafka rolling upgrade when broker listener configuration changes #5176

Description

Environment

Upgrade Steps

Observations

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Client connection failures during Kafka rolling upgrade when broker listener configuration changes #5176

Description

Description

Environment

Upgrade Steps

Observations

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions