-
Notifications
You must be signed in to change notification settings - Fork 607
Description
Steps to reproduce this issue:
-
Start KeyDB server by specifying repl-backlog-size and repl-backlog-disk-reserve as the following:
save ""
server-threads 4
repl-backlog-size 1mb
repl-backlog-disk-reserve 100mb -
At the meantime, run the following info command every 100 second:
watch -n 0.1 -c 'date;./keydb-cli -h 192.168.0.221 -p 7001 info'
-
Run the following benchmark command:
./keydb-benchmark -h 192.168.0.221 -c 200 -p 7001 -r 10000 -t set -d 500 -l -q
Note: it looks like the parallel executing the replication backlog resize causing deadlock and crash the server. However, it may not happen all the time.
object.cpp https://github.com/Snapchat/KeyDB/blob/main/src/object.cpp (line 1094-1095), it crashed when += mem size for the repl_backlog
if (g_pserver->repl_backlog)
mem += zmalloc_size(g_pserver->repl_backlog);
Do we need to "Lock" mem += operation here?
Crash report
423364:423390:M 27 Jul 2022 17:47:25.977 * Synchronization with replica 192.168.0.221:7002 succeeded 423364:423391:M 27 Jul 2022 17:47:26.966 # Replication backlog is too small, resizing to: 2048 bytes 423364:423391:M 27 Jul 2022 17:47:26.966 * Switching to disk backed replication backlog due to exceeding memory limits === KEYDB BUG REPORT START: Cut & paste starting from here === 423364:423392:M 27 Jul 2022 17:47:27.011 # KeyDB 255.255.255 crashed by signal: 11, si_code: 1 423364:423392:M 27 Jul 2022 17:47:27.011 # Accessing address: 0x18e000 423364:423392:M 27 Jul 2022 17:47:27.011 # Crashed running the instruction at: 0xab26ec ------ STACK TRACE ------ EIP: ./keydb-server-init 192.168.0.221:7001(je_malloc_usable_size+0x5c) [0xab26ec] Backtrace: /usr/lib64/libpthread.so.0(+0x135a0) [0x7fb1ab9085a0] ./keydb-server-init 192.168.0.221:7001(je_malloc_usable_size+0x5c) [0xab26ec] ./keydb-server-init 192.168.0.221:7001(getMemoryOverheadData()+0x1af) [0x65a56f] ./keydb-server-init 192.168.0.221:7001(genRedisInfoString(client*, char const*)+0x48f) [0x69256f] ./keydb-server-init 192.168.0.221:7001(infoCommand(client*)+0x33) [0x6941e3] ./keydb-server-init 192.168.0.221:7001(call(client*, int)+0xa1) [0x695621] ./keydb-server-init 192.168.0.221:7001(processCommand(client*, int)+0x820) [0x696650] ./keydb-server-init 192.168.0.221:7001(processCommandAndResetClient(client*, int)+0x65) [0x5b2255] ./keydb-server-init 192.168.0.221:7001(processInputBuffer(client*, bool, int)+0x174) [0x5b4e74] ./keydb-server-init 192.168.0.221:7001(processClients()+0xc9) [0x5b6369] ./keydb-server-init 192.168.0.221:7001() [0x6aa4f2] ./keydb-server-init 192.168.0.221:7001(beforeSleep(aeEventLoop*)+0x16e) [0x59bc5e] ./keydb-server-init 192.168.0.221:7001(aeProcessEvents+0xe8) [0x597868] ./keydb-server-init 192.168.0.221:7001(aeMain+0x37) [0x59b957] ./keydb-server-init 192.168.0.221:7001(workerThreadMain(void*)+0x64) [0x6aa584] /usr/lib64/libpthread.so.0(+0x8f3b) [0x7fb1ab8fdf3b] /usr/lib64/libc.so.6(clone+0x40) [0x7fb1ab835810] ------ REGISTERS ------ 423364:423392:M 27 Jul 2022 17:47:27.013 # RAX:000000000018e000 RBX:00007fb171c00000 RCX:00007fb171c00000 RDX:0000000000000001 RDI:00007fb194efbbd0 RSI:00007fb194efbb10 RBP:0000000000000004 RSP:00007fb194efa820 R8 :0000000000000000 R9 :00007fb140000000 R10:0000000000000006 R11:00000000000000b9 R12:0000000000394b88 R13:00000000011b0700 R14:00000000011b0700 R15:0000000000000002 RIP:0000000000ab26ec EFL:0000000000010206 CSGSFS:002b000000000033 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82f) -> 00007fb19627f005 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82e) -> 0000002c00000001 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82d) -> 00007fb194efaf38 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82c) -> 000000000000b000 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82b) -> 00007fb194000900 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa82a) -> 0000000000b17223 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa829) -> 0000000000000000 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa828) -> 0000000000b3f8ed 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa827) -> 0000000000b3f8f6 423364:423392:M 27 Jul 2022 17:47:27.013 # (00007fb194efa826) -> 0000000000b17481 423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa825) -> 0000000000f94160 423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa824) -> 00007fb194efba90 423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa823) -> 00007fb194010f00 423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa822) -> 00007fb194efa840 423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa821) -> 00007fb194efa9f0 423364:423392:M 27 Jul 2022 17:47:27.014 # (00007fb194efa820) -> 0000003000000030 ------ INFO OUTPUT ------
<pre>
Core dump:
(gdb) bt
#0 atomic_load_p (mo=atomic_memory_order_relaxed, a=0x18e000) at include/jemalloc/internal/atomic.h:62
#1 rtree_leaf_elm_bits_read (dependent=true, elm=0x18e000, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:175
#2 rtree_leaf_elm_szind_read (dependent=true, elm=0x18e000, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:227
#3 rtree_szind_read (dependent=true, key=140400094347264, rtree_ctx=, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:434
#4 arena_salloc (ptr=, tsdn=) at include/jemalloc/internal/arena_inlines_b.h:191
#5 isalloc (ptr=, tsdn=) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:38
#6 je_malloc_usable_size (ptr=0x7fb171c00000) at src/jemalloc.c:3740
#7 0x000000000065a56f in _Z21getMemoryOverheadDatav () at object.cpp:1094
#8 0x000000000069256f in genRedisInfoString (c=0x0, section=0xb2383c "all") at server.cpp:5687
#9 0x00000000005ee9e1 in logServerInfo () at debug.cpp:1748
#10 0x00000000005eea43 in printCrashReport () at debug.cpp:2013
#11 0x00000000005eeaec in sigsegvHandler (sig=11, info=0x7fb194ef9c70, secret=0x7fb194ef9b40) at debug.cpp:1999
#12
#13 atomic_load_p (mo=atomic_memory_order_relaxed, a=0x18e000) at include/jemalloc/internal/atomic.h:62
#14 rtree_leaf_elm_bits_read (dependent=true, elm=0x18e000, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:175
#15 rtree_leaf_elm_szind_read (dependent=true, elm=0x18e000, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:227
#16 rtree_szind_read (dependent=true, key=140400094347264, rtree_ctx=, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:434
#17 arena_salloc (ptr=, tsdn=) at include/jemalloc/internal/arena_inlines_b.h:191
#18 isalloc (ptr=, tsdn=) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:38
#19 je_malloc_usable_size (ptr=0x7fb171c00000) at src/jemalloc.c:3740
#20 0x000000000065a56f in _Z21getMemoryOverheadDatav () at object.cpp:1094
#21 0x000000000069256f in genRedisInfoString (c=c@entry=0x7fb185c56600, section=0xb26e30 "default") at server.cpp:5687
#22 0x00000000006941e3 in infoCommand (c=, c=) at server.cpp:6358
#23 0x0000000000695621 in call (c=c@entry=0x7fb185c56600, flags=flags@entry=31) at server.cpp:4488
#24 0x0000000000696650 in processCommand (c=0x7fb185c56600, callFlags=31) at server.cpp:5067
#25 0x00000000005b2255 in processCommandAndResetClient (c=c@entry=0x7fb185c56600, flags=flags@entry=31) at networking.cpp:2616
#26 0x00000000005b4e74 in processInputBuffer (c=c@entry=0x7fb185c56600, fParse=fParse@entry=false, callFlags=callFlags@entry=31) at networking.cpp:2772
#27 0x00000000005b6369 in processClients () at networking.cpp:2922
#28 0x00000000006aa4f2 in _Z25runAndPropogateToReplicasIFvvEJEEvPT_DpT0_ (pfn=) at server.h:3902
#29 0x000000000059bc5e in beforeSleep (eventLoop=0x7fb1ab232900) at server.cpp:2807
#30 0x0000000000597868 in aeProcessEvents (eventLoop=eventLoop@entry=0x7fb1ab232900, flags=flags@entry=27) at ae.cpp:710
#31 0x000000000059b957 in aeMain (eventLoop=) at ae.cpp:770
#32 0x00000000006aa584 in _Z16workerThreadMainPv (parg=0x2) at server.cpp:7324
#33 0x00007fb1ab8fdf3b in ?? () from /usr/lib64/libpthread.so.0
#34 0x00007fb1ab835810 in clone () from /usr/lib64/libc.so.6
</pre>