Skip to content

Commit af451bb

Browse files
committed
PR #4970: Code and tests fixes to make the full test suite pass
1 parent 6702358 commit af451bb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+1898
-499
lines changed

.semaphore/run-all-tests.yml

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
version: v1.0
2+
name: run-all-tests
3+
4+
agent:
5+
machine:
6+
type: s1-prod-ubuntu22-04-amd64-1
7+
8+
execution_time_limit:
9+
hours: 3
10+
11+
global_job_config:
12+
prologue:
13+
commands:
14+
- checkout
15+
- '[[ -z "$GIT_REF" ]] || git checkout $GIT_REF'
16+
- wget -O rapidjson-dev.deb https://launchpad.net/ubuntu/+archive/primary/+files/rapidjson-dev_1.1.0+dfsg2-3_all.deb
17+
- sudo dpkg -i rapidjson-dev.deb
18+
- sudo apt update
19+
- sudo apt remove -y needrestart
20+
- sudo apt install -y valgrind
21+
- python3 -m pip install -U pip
22+
- python3 -m pip -V
23+
- (cd tests && python3 -m pip install -r requirements.txt)
24+
- ./configure --install-deps
25+
- make -j all
26+
- make -j -C tests build
27+
- sem-version java 17
28+
29+
blocks:
30+
- name: "Run all tests (x86_64)"
31+
dependencies: []
32+
task:
33+
agent:
34+
machine:
35+
type: s1-prod-ubuntu22-04-amd64-2
36+
prologue:
37+
commands:
38+
- if [[ "$TEST_ARCHES" != *"x86_64"* ]]; then exit 0; fi
39+
jobs:
40+
- name: "PLAINTEXT cluster (x86_64)"
41+
env_vars:
42+
- name: TEST_SSL
43+
value: "False"
44+
commands:
45+
- if [[ "$TEST_TYPE" != *"plaintext"* ]]; then exit 0; fi
46+
- ./tests/run-all-tests.sh
47+
- name: "SSL cluster (x86_64)"
48+
env_vars:
49+
- name: TEST_SSL
50+
value: "True"
51+
commands:
52+
- if [[ "$TEST_TYPE" != *"ssl"* ]]; then exit 0; fi
53+
- ./tests/run-all-tests.sh
54+
- name: "Run all tests (aarch64)"
55+
dependencies: []
56+
task:
57+
agent:
58+
machine:
59+
type: s1-prod-ubuntu22-04-arm64-2
60+
prologue:
61+
commands:
62+
- if [[ "$TEST_ARCHES" != *"aarch64"* ]]; then exit 0; fi
63+
jobs:
64+
- name: "PLAINTEXT cluster (aarch64)"
65+
env_vars:
66+
- name: TEST_SSL
67+
value: "False"
68+
commands:
69+
- if [[ "$TEST_TYPE" != *"plaintext"* ]]; then exit 0; fi
70+
- ./tests/run-all-tests.sh
71+
- name: "SSL cluster (aarch64)"
72+
env_vars:
73+
- name: TEST_SSL
74+
value: "True"
75+
commands:
76+
- if [[ "$TEST_TYPE" != *"ssl"* ]]; then exit 0; fi
77+
- ./tests/run-all-tests.sh

.semaphore/semaphore.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -384,3 +384,25 @@ blocks:
384384
# Upload all packages to project artifact store
385385
- artifact push project packages --destination librdkafka-packages-${SEMAPHORE_GIT_TAG_NAME}-${SEMAPHORE_WORKFLOW_ID}
386386
- echo Thank you
387+
388+
promotions:
389+
- name: Run all tests on master commits
390+
pipeline_file: run-all-tests.yml
391+
parameters:
392+
env_vars:
393+
- required: true
394+
name: TEST_KAFKA_GIT_REF
395+
default_value: 3.8.0
396+
- required: true
397+
name: TEST_TYPE
398+
default_value: plaintext,ssl
399+
- required: true
400+
name: TEST_ARCHES
401+
default_value: x86_64,aarch64
402+
- required: true
403+
name: TEST_PARALLEL
404+
default_value: "1"
405+
auto_promote_on:
406+
- result: passed
407+
branch:
408+
- "master"

CHANGELOG.md

Lines changed: 97 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,36 @@
11
# librdkafka v2.9.0
22

3+
librdkafka v2.9.0 is a feature release:
4+
35
* Identify brokers only by broker id (#4557, @mfleming)
46
* Remove unavailable brokers and their thread (#4557, @mfleming)
5-
7+
* Fix for librdkafka yielding before timeouts had been reached (#)
8+
* Removed a 500ms latency when a consumer partition switches to a different
9+
leader (#)
10+
* The mock cluster implementation removes brokers from Metadata response
11+
when they're not available, this simulates better the actual behavior of
12+
a cluster that is using KRaft (#).
13+
* Doesn't remove topics from cache on temporary Metadata errors but only
14+
on metadata cache expiry (#).
15+
* Doesn't mark the topic as unknown if it had been marked as existent earlier
16+
and `topic.metadata.propagation.max.ms` hasn't passed still (#).
17+
* Doesn't update partition leaders if the topic in metadata
18+
response has errors (#).
19+
* Only topic authorization errors in a metadata response are considered
20+
permanent and are returned to the user (#).
21+
* The function `rd_kafka_offsets_for_times` refreshes leader information
22+
if the error requires it, allowing it to succeed on
23+
subsequent manual retries (#).
24+
* Deprecated `api.version.request`, `api.version.fallback.ms` and
25+
`broker.version.fallback` configuration properties (#).
26+
* When consumer is closed before destroying the client, the operations queue
27+
isn't purged anymore as it contains operations
28+
unrelated to the consumer group (#).
29+
* When making multiple changes to the consumer subscription in a short time,
30+
no unknown topic error is returned for topics that are in the new subscription but weren't in previous one (#).
31+
* Fix for the case where a metadata refresh enqueued on an unreachable broker
32+
prevents refreshing the controller or the coordinator until that broker
33+
becomes reachable again (#).
634

735
## Fixes
836

@@ -20,6 +48,74 @@
2048
temporarily or permanently so we always remove it and it'll be added back when
2149
it becomes available again.
2250
Happens since 1.x (#4557, @mfleming).
51+
* Issues: #
52+
librdkafka code using `cnd_timedwait` was yielding before a timeout occurred
53+
without the condition being fulfilled because of spurious wake-ups.
54+
Solved by verifying with a monotonic clock that the expected point in time
55+
was reached and calling the function again if needed.
56+
Happens since 1.x (#).
57+
* Issues: #
58+
Doesn't remove topics from cache on temporary Metadata errors but only
59+
on metadata cache expiry. It allows the client to continue working
60+
in case of temporary problems to the Kafka metadata plane.
61+
Happens since 1.x (#).
62+
* Issues: #
63+
Doesn't mark the topic as unknown if it had been marked as existent earlier
64+
and `topic.metadata.propagation.max.ms` hasn't passed still. It achieves
65+
this property expected effect even if a different broker had
66+
previously reported the topic as existent.
67+
Happens since 1.x (#).
68+
* Issues: #
69+
Doesn't update partition leaders if the topic in metadata
70+
response has errors. It's in line with what Java client does and allows
71+
to avoid segmentation faults for unknown partitions.
72+
Happens since 1.x (#).
73+
* Issues: #
74+
Only topic authorization errors in a metadata response are considered
75+
permanent and are returned to the user. It's in line with what Java client
76+
does and avoids returning to the user an error that wasn't meant to be
77+
permanent.
78+
Happens since 1.x (#).
79+
* Issues: #
80+
Fix for the case where a metadata refresh enqueued on an unreachable broker
81+
prevents refreshing the controller or the coordinator until that broker
82+
becomes reachable again. Given the request continues to be retried on that
83+
broker, the counter for refreshing complete broker metadata doesn't reach
84+
zero and prevents the client from obtaining the new controller or group or transactional coordinator.
85+
It causes a series of debug messages like:
86+
"Skipping metadata request: ... full request already in-transit", until
87+
the broker the request is enqueued on is up again.
88+
Solved by not retrying these kinds of metadata requests.
89+
Happens since 1.x (#).
90+
91+
### Consumer fixes
92+
93+
* Issues: #
94+
When switching to a different leader a consumer could wait 500ms
95+
(`fetch.error.backoff.ms`) before starting to fetch again. The fetch backoff wasn't reset when joining the new broker.
96+
Solved by resetting it, given it's not needed to backoff
97+
the first fetch on a different node. This way faster leader switches are
98+
possible.
99+
Happens since 1.x (#).
100+
* Issues: #
101+
The function `rd_kafka_offsets_for_times` refreshes leader information
102+
if the error requires it, allowing it to succeed on
103+
subsequent manual retries. Similar to the fix done in 2.3.0 in
104+
`rd_kafka_query_watermark_offsets`. Additionally, the partition
105+
current leader epoch is taken from metadata cache instead of
106+
from passed partitions.
107+
Happens since 1.x (#).
108+
* Issues: #
109+
When consumer is closed before destroying the client, the operations queue
110+
isn't purged anymore as it contains operations
111+
unrelated to the consumer group.
112+
Happens since 1.x (#).
113+
* Issues: #
114+
When making multiple changes to the consumer subscription in a short time,
115+
no unknown topic error is returned for topics that are in the new subscription
116+
but weren't in previous one. This was due to the metadata request relative
117+
to previous subscription.
118+
Happens since 1.x (#).
23119

24120

25121

CONFIGURATION.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,10 @@ resolve_cb | * | |
5454
opaque | * | | | low | Application opaque (set with rd_kafka_conf_set_opaque()) <br>*Type: see dedicated API*
5555
default_topic_conf | * | | | low | Default topic configuration for automatically subscribed topics <br>*Type: see dedicated API*
5656
internal.termination.signal | * | 0 .. 128 | 0 | low | Signal that librdkafka will use to quickly terminate on rd_kafka_destroy(). If this signal is not set then there will be a delay before rd_kafka_wait_destroyed() returns true as internal threads are timing out their system calls. If this signal is set however the delay will be minimal. The application should mask this signal as an internal signal handler is installed. <br>*Type: integer*
57-
api.version.request | * | true, false | true | high | Request broker's supported API versions to adjust functionality to available protocol features. If set to false, or the ApiVersionRequest fails, the fallback version `broker.version.fallback` will be used. **NOTE**: Depends on broker version >=0.10.0. If the request is not supported by (an older) broker the `broker.version.fallback` fallback is used. <br>*Type: boolean*
57+
api.version.request | * | true, false | true | high | **DEPRECATED** **Post-deprecation actions: remove this configuration property, brokers < 0.10.0 won't be supported anymore in librdkafka 3.x.** Request broker's supported API versions to adjust functionality to available protocol features. If set to false, or the ApiVersionRequest fails, the fallback version `broker.version.fallback` will be used. **NOTE**: Depends on broker version >=0.10.0. If the request is not supported by (an older) broker the `broker.version.fallback` fallback is used. <br>*Type: boolean*
5858
api.version.request.timeout.ms | * | 1 .. 300000 | 10000 | low | Timeout for broker API version requests. <br>*Type: integer*
59-
api.version.fallback.ms | * | 0 .. 604800000 | 0 | medium | Dictates how long the `broker.version.fallback` fallback is used in the case the ApiVersionRequest fails. **NOTE**: The ApiVersionRequest is only issued when a new connection to the broker is made (such as after an upgrade). <br>*Type: integer*
60-
broker.version.fallback | * | | 0.10.0 | medium | Older broker versions (before 0.10.0) provide no way for a client to query for supported protocol features (ApiVersionRequest, see `api.version.request`) making it impossible for the client to know what features it may use. As a workaround a user may set this property to the expected broker version and the client will automatically adjust its feature set accordingly if the ApiVersionRequest fails (or is disabled). The fallback broker version will be used for `api.version.fallback.ms`. Valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0. Any other value >= 0.10, such as 0.10.2.1, enables ApiVersionRequests. <br>*Type: string*
59+
api.version.fallback.ms | * | 0 .. 604800000 | 0 | medium | **DEPRECATED** **Post-deprecation actions: remove this configuration property, brokers < 0.10.0 won't be supported anymore in librdkafka 3.x.** Dictates how long the `broker.version.fallback` fallback is used in the case the ApiVersionRequest fails. **NOTE**: The ApiVersionRequest is only issued when a new connection to the broker is made (such as after an upgrade). <br>*Type: integer*
60+
broker.version.fallback | * | | 0.10.0 | medium | **DEPRECATED** **Post-deprecation actions: remove this configuration property, brokers < 0.10.0 won't be supported anymore in librdkafka 3.x.** Older broker versions (before 0.10.0) provide no way for a client to query for supported protocol features (ApiVersionRequest, see `api.version.request`) making it impossible for the client to know what features it may use. As a workaround a user may set this property to the expected broker version and the client will automatically adjust its feature set accordingly if the ApiVersionRequest fails (or is disabled). The fallback broker version will be used for `api.version.fallback.ms`. Valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0. Any other value >= 0.10, such as 0.10.2.1, enables ApiVersionRequests. <br>*Type: string*
6161
allow.auto.create.topics | * | true, false | false | low | Allow automatic topic creation on the broker when subscribing to or assigning non-existent topics. The broker must also be configured with `auto.create.topics.enable=true` for this configuration to take effect. Note: the default value (true) for the producer is different from the default value (false) for the consumer. Further, the consumer default value is different from the Java consumer (true), and this property is not supported by the Java producer. Requires broker version >= 0.11.0.0, for older broker versions only the broker configuration applies. <br>*Type: boolean*
6262
security.protocol | * | plaintext, ssl, sasl_plaintext, sasl_ssl | plaintext | high | Protocol used to communicate with brokers. <br>*Type: enum value*
6363
ssl.cipher.suites | * | | | low | A cipher suite is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. See manual page for `ciphers(1)` and `SSL_CTX_set_cipher_list(3). <br>*Type: string*

0 commit comments

Comments
 (0)