Skip to content

[CRASH] The server crashed when resizing the replication backlog size and running info at the same time #477

@paulmchen

Description

@paulmchen

Steps to reproduce this issue:

  1. Start KeyDB server by specifying repl-backlog-size and repl-backlog-disk-reserve as the following:

    save ""
    server-threads 4
    repl-backlog-size 1mb
    repl-backlog-disk-reserve 100mb

  2. At the meantime, run the following info command every 100 second:

    watch -n 0.1 -c 'date;./keydb-cli -h 192.168.0.221 -p 7001 info'

  3. Run the following benchmark command:

    ./keydb-benchmark -h 192.168.0.221 -c 200 -p 7001 -r 10000 -t set -d 500 -l -q

Note: it looks like the parallel executing the replication backlog resize causing deadlock and crash the server. However, it may not happen all the time.

object.cpp https://github.com/Snapchat/KeyDB/blob/main/src/object.cpp (line 1094-1095), it crashed when += mem size for the repl_backlog

if (g_pserver->repl_backlog)
mem += zmalloc_size(g_pserver->repl_backlog);

Do we need to "Lock" mem += operation here?

Crash report

423364:423390:M 27 Jul 2022 17:47:25.977 * Synchronization with replica 192.168.0.221:7002 succeeded
423364:423391:M 27 Jul 2022 17:47:26.966 # Replication backlog is too small, resizing to: 2048 bytes
423364:423391:M 27 Jul 2022 17:47:26.966 * Switching to disk backed replication backlog due to exceeding memory limits


=== KEYDB BUG REPORT START: Cut & paste starting from here ===
423364:423392:M 27 Jul 2022 17:47:27.011 # KeyDB 255.255.255 crashed by signal: 11, si_code: 1
423364:423392:M 27 Jul 2022 17:47:27.011 # Accessing address: 0x18e000
423364:423392:M 27 Jul 2022 17:47:27.011 # Crashed running the instruction at: 0xab26ec

------ STACK TRACE ------
EIP:
./keydb-server-init 192.168.0.221:7001(je_malloc_usable_size+0x5c) [0xab26ec]

Backtrace:
/usr/lib64/libpthread.so.0(+0x135a0) [0x7fb1ab9085a0]
./keydb-server-init 192.168.0.221:7001(je_malloc_usable_size+0x5c) [0xab26ec]
./keydb-server-init 192.168.0.221:7001(getMemoryOverheadData()+0x1af) [0x65a56f]
./keydb-server-init 192.168.0.221:7001(genRedisInfoString(client*, char const*)+0x48f) [0x69256f]
./keydb-server-init 192.168.0.221:7001(infoCommand(client*)+0x33) [0x6941e3]
./keydb-server-init 192.168.0.221:7001(call(client*, int)+0xa1) [0x695621]
./keydb-server-init 192.168.0.221:7001(processCommand(client*, int)+0x820) [0x696650]
./keydb-server-init 192.168.0.221:7001(processCommandAndResetClient(client*, int)+0x65) [0x5b2255]
./keydb-server-init 192.168.0.221:7001(processInputBuffer(client*, bool, int)+0x174) [0x5b4e74]
./keydb-server-init 192.168.0.221:7001(processClients()+0xc9) [0x5b6369]
./keydb-server-init 192.168.0.221:7001() [0x6aa4f2]
./keydb-server-init 192.168.0.221:7001(beforeSleep(aeEventLoop*)+0x16e) [0x59bc5e]
./keydb-server-init 192.168.0.221:7001(aeProcessEvents+0xe8) [0x597868]
./keydb-server-init 192.168.0.221:7001(aeMain+0x37) [0x59b957]
./keydb-server-init 192.168.0.221:7001(workerThreadMain(void*)+0x64) [0x6aa584]
/usr/lib64/libpthread.so.0(+0x8f3b) [0x7fb1ab8fdf3b]
/usr/lib64/libc.so.6(clone+0x40) [0x7fb1ab835810]

------ REGISTERS ------
423364:423392:M 27 Jul 2022 17:47:27.013 # 
RAX:000000000018e000 RBX:00007fb171c00000
RCX:00007fb171c00000 RDX:0000000000000001
RDI:00007fb194efbbd0 RSI:00007fb194efbb10
RBP:0000000000000004 RSP:00007fb194efa820
R8 :0000000000000000 R9 :00007fb140000000
R10:0000000000000006 R11:00000000000000b9
R12:0000000000394b88 R13:00000000011b0700
R14:00000000011b0700 R15:0000000000000002
RIP:0000000000ab26ec EFL:0000000000010206
CSGSFS:002b000000000033
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82f) -> 00007fb19627f005
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82e) -> 0000002c00000001
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82d) -> 00007fb194efaf38
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82c) -> 000000000000b000
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82b) -> 00007fb194000900
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82a) -> 0000000000b17223
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa829) -> 0000000000000000
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa828) -> 0000000000b3f8ed
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa827) -> 0000000000b3f8f6
423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa826) -> 0000000000b17481
423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa825) -> 0000000000f94160
423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa824) -> 00007fb194efba90
423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa823) -> 00007fb194010f00
423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa822) -> 00007fb194efa840
423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa821) -> 00007fb194efa9f0
423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa820) -> 0000003000000030

------ INFO OUTPUT ------
<pre>
Core dump:

(gdb) bt
#0  atomic_load_p (mo=atomic_memory_order_relaxed, a=0x18e000) at include/jemalloc/internal/atomic.h:62
#1  rtree_leaf_elm_bits_read (dependent=true, elm=0x18e000, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:175
#2  rtree_leaf_elm_szind_read (dependent=true, elm=0x18e000, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:227
#3  rtree_szind_read (dependent=true, key=140400094347264, rtree_ctx=, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:434
#4  arena_salloc (ptr=, tsdn=) at include/jemalloc/internal/arena_inlines_b.h:191
#5  isalloc (ptr=, tsdn=) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:38
#6  je_malloc_usable_size (ptr=0x7fb171c00000) at src/jemalloc.c:3740
#7  0x000000000065a56f in _Z21getMemoryOverheadDatav () at object.cpp:1094
#8  0x000000000069256f in genRedisInfoString (c=0x0, section=0xb2383c "all") at server.cpp:5687
#9  0x00000000005ee9e1 in logServerInfo () at debug.cpp:1748
#10 0x00000000005eea43 in printCrashReport () at debug.cpp:2013
#11 0x00000000005eeaec in sigsegvHandler (sig=11, info=0x7fb194ef9c70, secret=0x7fb194ef9b40) at debug.cpp:1999
#12 
#13 atomic_load_p (mo=atomic_memory_order_relaxed, a=0x18e000) at include/jemalloc/internal/atomic.h:62
#14 rtree_leaf_elm_bits_read (dependent=true, elm=0x18e000, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:175
#15 rtree_leaf_elm_szind_read (dependent=true, elm=0x18e000, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:227
#16 rtree_szind_read (dependent=true, key=140400094347264, rtree_ctx=, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:434
#17 arena_salloc (ptr=, tsdn=) at include/jemalloc/internal/arena_inlines_b.h:191
#18 isalloc (ptr=, tsdn=) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:38
#19 je_malloc_usable_size (ptr=0x7fb171c00000) at src/jemalloc.c:3740
#20 0x000000000065a56f in _Z21getMemoryOverheadDatav () at object.cpp:1094
#21 0x000000000069256f in genRedisInfoString (c=c@entry=0x7fb185c56600, section=0xb26e30 "default") at server.cpp:5687
#22 0x00000000006941e3 in infoCommand (c=, c=) at server.cpp:6358
#23 0x0000000000695621 in call (c=c@entry=0x7fb185c56600, flags=flags@entry=31) at server.cpp:4488
#24 0x0000000000696650 in processCommand (c=0x7fb185c56600, callFlags=31) at server.cpp:5067
#25 0x00000000005b2255 in processCommandAndResetClient (c=c@entry=0x7fb185c56600, flags=flags@entry=31) at networking.cpp:2616
#26 0x00000000005b4e74 in processInputBuffer (c=c@entry=0x7fb185c56600, fParse=fParse@entry=false, callFlags=callFlags@entry=31) at networking.cpp:2772
#27 0x00000000005b6369 in processClients () at networking.cpp:2922
#28 0x00000000006aa4f2 in _Z25runAndPropogateToReplicasIFvvEJEEvPT_DpT0_ (pfn=) at server.h:3902
#29 0x000000000059bc5e in beforeSleep (eventLoop=0x7fb1ab232900) at server.cpp:2807
#30 0x0000000000597868 in aeProcessEvents (eventLoop=eventLoop@entry=0x7fb1ab232900, flags=flags@entry=27) at ae.cpp:710
#31 0x000000000059b957 in aeMain (eventLoop=) at ae.cpp:770
#32 0x00000000006aa584 in _Z16workerThreadMainPv (parg=0x2) at server.cpp:7324
#33 0x00007fb1ab8fdf3b in ?? () from /usr/lib64/libpthread.so.0
#34 0x00007fb1ab835810 in clone () from /usr/lib64/libc.so.6
</pre>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions