-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ
Description
High CPU usage in librdkafka1.4.0 producer, the below is system call,lots of poll and read call are sent in few miliseconds.
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 906) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 905) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 905) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 905) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 905) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 904) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 904) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 904) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 904) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 903) = 1 ([{fd=86, revents=POLLOUT}])
read(86, 0x108a2f3, 5) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=86, events=POLLIN|POLLOUT}, {fd=63, events=POLLIN}], 2, 856) = 1 ([{fd=86, revents=POLLOUT}])
stack:
Thread 1 (process 29217):
#0 0x00007f8b4028c22d in read () from /lib64/libpthread.so.0
#1 0x0000000000625c75 in sock_read ()
#2 0x00000000006245ac in bread_conv ()
#3 0x00000000006232a2 in BIO_read ()
#4 0x00000000005c4910 in ssl3_read_n ()
#5 0x00000000005c8cf6 in ssl3_get_record ()
#6 0x00000000005c65c3 in ssl3_read_bytes ()
#7 0x00000000005f74a8 in tls_get_message_header ()
#8 0x00000000005ed455 in state_machine ()
#9 0x00000000005d9488 in SSL_do_handshake ()
#10 0x00007f8aae993cab in rd_kafka_transport_ssl_handshake (rktrans=rktrans@entry=0x11f3d90) at rdkafka_ssl.c:578
#11 0x00007f8aae8ecae5 in rd_kafka_transport_io_event (rktrans=rktrans@entry=0x11f3d90, events=events@entry=4) at rdkafka_transport.c:686
#12 0x00007f8aae8ed4db in rd_kafka_transport_io_serve (rktrans=0x11f3d90, timeout_ms=43) at rdkafka_transport.c:818
#13 0x00007f8aae8d74ed in rd_kafka_broker_ops_io_serve (rkb=rkb@entry=0xf5a700, abs_timeout=<optimized out>) at rdkafka_broker.c:3167
#14 0x00007f8aae8d8c78 in rd_kafka_broker_producer_serve (abs_timeout=580904992026, rkb=0xf5a700) at rdkafka_broker.c:3758
#15 rd_kafka_broker_serve (rkb=rkb@entry=0xf5a700, timeout_ms=<optimized out>, timeout_ms@entry=1000) at rdkafka_broker.c:4806
#16 0x00007f8aae8d921d in rd_kafka_broker_thread_main (arg=arg@entry=0xf5a700) at rdkafka_broker.c:4941
#17 0x00007f8aae92fae7 in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:576
#18 0x00007f8b40285dc5 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f8b3f7fb94d in clone () from /lib64/libc.so.6
or:
Thread 1 (process 29217):
#0 0x00007f8b3f8093c2 in __libc_disable_asynccancel () from /lib64/libc.so.6
#1 0x00007f8b3f7f12f9 in poll () from /lib64/libc.so.6
#2 0x00007f8aae8ed40e in rd_kafka_transport_poll (rktrans=rktrans@entry=0x11f3d90, tmout=tmout@entry=722) at rdkafka_transport.c:982
#3 0x00007f8aae8ed49f in rd_kafka_transport_io_serve (rktrans=0x11f3d90, timeout_ms=722) at rdkafka_transport.c:809
#4 0x00007f8aae8d74ed in rd_kafka_broker_ops_io_serve (rkb=rkb@entry=0xf5a700, abs_timeout=<optimized out>) at rdkafka_broker.c:3167
#5 0x00007f8aae8d8c78 in rd_kafka_broker_producer_serve (abs_timeout=580888839024, rkb=0xf5a700) at rdkafka_broker.c:3758
#6 rd_kafka_broker_serve (rkb=rkb@entry=0xf5a700, timeout_ms=<optimized out>, timeout_ms@entry=1000) at rdkafka_broker.c:4806
#7 0x00007f8aae8d921d in rd_kafka_broker_thread_main (arg=arg@entry=0xf5a700) at rdkafka_broker.c:4941
#8 0x00007f8aae92fae7 in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:576
#9 0x00007f8b40285dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f8b3f7fb94d in clone () from /lib64/libc.so.6
logs:
%4|1594656045.256|REQTMOUT|rdkafka#producer-1| [thrd:sasl_ssl://24.68.212.40:9097/bootstrap]: sasl_ssl://24.68.212.40:9097/1: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%4|1594656049.535|REQTMOUT|rdkafka#producer-1| [thrd:sasl_ssl://24.68.212.42:9097/bootstrap]: sasl_ssl://24.68.212.42:9097/3: Timed out 0 in-flight, 0 retry-queued, 60 out-queue, 0 partially-sent requests
%4|1594656056.682|REQTMOUT|rdkafka#producer-1| [thrd:sasl_ssl://24.68.212.42:9097/bootstrap]: sasl_ssl://24.68.212.42:9097/3: Timed out 0 in-flight, 0 retry-queued, 60 out-queue, 0 partially-sent requests
How to reproduce
security.protocol=SASL_SSL
sasl.mechanisms=PLAIN
ssl.ca.location=xxx
sasl.username=xxx
sasl.password=xxx
compression.codec=snappy
statistics.interval.ms=100
broker.version.fallback=0.10.0.0
log.connection.close=false
api.version.fallback.ms=100
topic.metadata.refresh.interval.ms=90000
message.send.max.retries=3
queue.buffering.max.messages=1000000
start kafka producer ,and call rd_kafka_poll every 1 second, after a day or two,the case reproduces
IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
- librdkafka version (release number or git tag):
<REPLACE with e.g., v0.10.5 or a git sha. NOT "latest" or "current"> - Apache Kafka version:
<REPLACE with e.g., 0.10.2.3> - librdkafka client configuration:
<REPLACE with e.g., message.timeout.ms=123, auto.reset.offset=earliest, ..> - Operating system:
<REPLACE with e.g., Centos 5 (x64)> - Provide logs (with
debug=..as necessary) from librdkafka - Provide broker log excerpts
- Critical issue