Skip to content

High CPU usage in librdkafka1.4.0 producer #2986

@yuz10

Description

@yuz10

Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ

Description

High CPU usage in librdkafka1.4.0 producer, the below is system call,lots of poll and read call are sent in few miliseconds.

read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 906) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 905) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 905) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 905) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 905) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 904) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 904) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 904) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 904) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 903) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5)                  = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 856) = 1 ([{fd=86, revents=POLLOUT}])

stack:

Thread 1 (process 29217):
#0  0x00007f8b4028c22d in read () from /lib64/libpthread.so.0
#1  0x0000000000625c75 in sock_read ()
#2  0x00000000006245ac in bread_conv ()
#3  0x00000000006232a2 in BIO_read ()
#4  0x00000000005c4910 in ssl3_read_n ()
#5  0x00000000005c8cf6 in ssl3_get_record ()
#6  0x00000000005c65c3 in ssl3_read_bytes ()
#7  0x00000000005f74a8 in tls_get_message_header ()
#8  0x00000000005ed455 in state_machine ()
#9  0x00000000005d9488 in SSL_do_handshake ()
#10 0x00007f8aae993cab in rd_kafka_transport_ssl_handshake (rktrans=rktrans@entry=0x11f3d90) at rdkafka_ssl.c:578
#11 0x00007f8aae8ecae5 in rd_kafka_transport_io_event (rktrans=rktrans@entry=0x11f3d90, events=events@entry=4) at rdkafka_transport.c:686
#12 0x00007f8aae8ed4db in rd_kafka_transport_io_serve (rktrans=0x11f3d90, timeout_ms=43) at rdkafka_transport.c:818
#13 0x00007f8aae8d74ed in rd_kafka_broker_ops_io_serve (rkb=rkb@entry=0xf5a700, abs_timeout=<optimized out>) at rdkafka_broker.c:3167
#14 0x00007f8aae8d8c78 in rd_kafka_broker_producer_serve (abs_timeout=580904992026, rkb=0xf5a700) at rdkafka_broker.c:3758
#15 rd_kafka_broker_serve (rkb=rkb@entry=0xf5a700, timeout_ms=<optimized out>, timeout_ms@entry=1000) at rdkafka_broker.c:4806
#16 0x00007f8aae8d921d in rd_kafka_broker_thread_main (arg=arg@entry=0xf5a700) at rdkafka_broker.c:4941
#17 0x00007f8aae92fae7 in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:576
#18 0x00007f8b40285dc5 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f8b3f7fb94d in clone () from /lib64/libc.so.6

or:

Thread 1 (process 29217):
#0  0x00007f8b3f8093c2 in __libc_disable_asynccancel () from /lib64/libc.so.6
#1  0x00007f8b3f7f12f9 in poll () from /lib64/libc.so.6
#2  0x00007f8aae8ed40e in rd_kafka_transport_poll (rktrans=rktrans@entry=0x11f3d90, tmout=tmout@entry=722) at rdkafka_transport.c:982
#3  0x00007f8aae8ed49f in rd_kafka_transport_io_serve (rktrans=0x11f3d90, timeout_ms=722) at rdkafka_transport.c:809
#4  0x00007f8aae8d74ed in rd_kafka_broker_ops_io_serve (rkb=rkb@entry=0xf5a700, abs_timeout=<optimized out>) at rdkafka_broker.c:3167
#5  0x00007f8aae8d8c78 in rd_kafka_broker_producer_serve (abs_timeout=580888839024, rkb=0xf5a700) at rdkafka_broker.c:3758
#6  rd_kafka_broker_serve (rkb=rkb@entry=0xf5a700, timeout_ms=<optimized out>, timeout_ms@entry=1000) at rdkafka_broker.c:4806
#7  0x00007f8aae8d921d in rd_kafka_broker_thread_main (arg=arg@entry=0xf5a700) at rdkafka_broker.c:4941
#8  0x00007f8aae92fae7 in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:576
#9  0x00007f8b40285dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f8b3f7fb94d in clone () from /lib64/libc.so.6

logs:

%4|1594656045.256|REQTMOUT|rdkafka#producer-1| [thrd:sasl_ssl://24.68.212.40:9097/bootstrap]: sasl_ssl://24.68.212.40:9097/1: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%4|1594656049.535|REQTMOUT|rdkafka#producer-1| [thrd:sasl_ssl://24.68.212.42:9097/bootstrap]: sasl_ssl://24.68.212.42:9097/3: Timed out 0 in-flight, 0 retry-queued, 60 out-queue, 0 partially-sent requests
%4|1594656056.682|REQTMOUT|rdkafka#producer-1| [thrd:sasl_ssl://24.68.212.42:9097/bootstrap]: sasl_ssl://24.68.212.42:9097/3: Timed out 0 in-flight, 0 retry-queued, 60 out-queue, 0 partially-sent requests

How to reproduce

security.protocol=SASL_SSL
sasl.mechanisms=PLAIN
ssl.ca.location=xxx
sasl.username=xxx
sasl.password=xxx
compression.codec=snappy

statistics.interval.ms=100
broker.version.fallback=0.10.0.0
log.connection.close=false
api.version.fallback.ms=100
topic.metadata.refresh.interval.ms=90000
message.send.max.retries=3
queue.buffering.max.messages=1000000

start kafka producer ,and call rd_kafka_poll every 1 second, after a day or two,the case reproduces

IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version (release number or git tag): <REPLACE with e.g., v0.10.5 or a git sha. NOT "latest" or "current">
  • Apache Kafka version: <REPLACE with e.g., 0.10.2.3>
  • librdkafka client configuration: <REPLACE with e.g., message.timeout.ms=123, auto.reset.offset=earliest, ..>
  • Operating system: <REPLACE with e.g., Centos 5 (x64)>
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions