Skip to content

Crash when producer get different meta. #1884

@hulaxiaodai

Description

@hulaxiaodai

Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ

Description

we have a huge kafka cluster (1000+ node), I use librdkafka cpp prducer msg to kafka.
when we add a topic partition count, one of the brokers were out of sync. So producer may get wrong meta.
log is : LOG:5PARTCNT[thrd:main]: Topic org.xxxxxxx partition count changed from 16 to 10
but we have 22 partitons for this topic.

then producer crashed.

I get a core-rdk:broker405-31053-1531725863 file ,it size is 89G,
bt is :
#0 0x00007f26ecaff495 in raise () from /lib64/libc.so.6
#1 0x00007f26ecb00c75 in abort () from /lib64/libc.so.6
#2 0x00000000005ad293 in rd_kafka_crash () at rdkafka.c:3367
#3 0x0000000000604df5 in rd_kafka_toppar_destroy_final () at rdkafka_partition.c:269
#4 0x00000000005e94e8 in rd_kafka_handle_Produce () at rdkafka_request.c:1934
#5 0x00000000005dc896 in rd_kafka_buf_callback () at rdkafka_buf.c:444
#6 0x00000000005c17ba in rd_kafka_recv () at rdkafka_broker.c:1288
#7 0x00000000005d9ed0 in rd_kafka_transport_io_event () at rdkafka_transport.c:1419
#8 0x00000000005c8a40 in rd_kafka_broker_serve () at rdkafka_broker.c:2533
#9 0x00000000005ca499 in rd_kafka_broker_thread_main () at rdkafka_broker.c:2820
#10 0x0000000000616a17 in _thrd_wrapper_function ()
#11 0x00007f26ece68aa1 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f26ecbb5bcd in clone () from /lib64/libc.so.6

How to reproduce

cant reproduce every time. it dependence on broker state.it is difficult to make one broker out of sync.

Checklist

  • librdkafka version (release number or git tag): v0.11.5-PRE7
  • Apache Kafka version: 0.8.2.1
  • librdkafka client configuration:
    metadata.broker.list=rz-data-rt023:9092,rz-data-rt198:9092,gh-data-rt0774:9092,gh-data-rt1066:9092
    api.version.request=false
    broker.version.fallback=0.8.2.1
    queue.buffering.max.messages=300000
    message.max.bytes=4000000
    topic.metadata.refresh.interval.ms=300000
    metadata.max.age.ms=1500000
    queue.buffering.max.ms=5
    batch.num.messages=5000
    message.send.max.retries=1
    message.timeout.ms=900000
    request.required.acks=1
    request.timeout.ms=900000
    socket.max.fails=0
    log.connection.close=false
    socket.keepalive.enable=true
    queue.buffering.backpressure.threshold=0
  • Operating system: Centos6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions