Skip to content

Active-defrag crash while deleting keys #257

@Hitobat

Description

@Hitobat

Describe the bug
Active defrag causing crash when deleting keys.

** Log Files **

10603:11460:S 10 Nov 2020 11:07:41.060 # KeyDB 6.0.16 crashed by signal: 11
10603:11460:S 10 Nov 2020 11:07:41.060 # Crashed running the instruction at: 0x55598527e30d
10603:11460:S 10 Nov 2020 11:07:41.060 # Accessing address: 0x8
10603:11460:S 10 Nov 2020 11:07:41.060 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/usr/bin/keydb-server *:7001(activeDefragStringOb(redisObject*, long*)+0x4d) [0x55598527e30d]

Backtrace:
/usr/bin/keydb-server *:7001(logStackTrace(ucontext_t*)+0x6b) [0x55598523848b]
/usr/bin/keydb-server *:7001(sigsegvHandler(int, siginfo_t*, void*)+0xc9) [0x555985238c29]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0) [0x7fd8a24868a0]
/usr/bin/keydb-server *:7001(activeDefragStringOb(redisObject*, long*)+0x4d) [0x55598527e30d]
/usr/bin/keydb-server *:7001(defragKey(redisDb*, dictEntry*)+0x78) [0x555985281b28]
/usr/bin/keydb-server *:7001(defragScanCallback(void*, dictEntry const*)+0x9) [0x555985281de9]
/usr/bin/keydb-server *:7001(dictScan+0x212) [0x5559851d9f02]
/usr/bin/keydb-server *:7001(activeDefragCycle()+0x360) [0x55598527f8b0]
/usr/bin/keydb-server *:7001(databasesCron()+0x39) [0x5559851e3369]
/usr/bin/keydb-server *:7001(serverCron(aeEventLoop*, long long, void*)+0x29a) [0x5559851e36ea]
/usr/bin/keydb-server *:7001(aeProcessEvents+0x33f) [0x5559851d6aef]
/usr/bin/keydb-server *:7001(aeMain+0x45) [0x5559851d6e65]
/usr/bin/keydb-server *:7001(workerThreadMain(void*)+0x74) [0x5559851dd7f4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fd8a247b6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fd8a21a4a3f]

------ INFO OUTPUT ------
# Server
redis_version:6.0.16
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:fa6f9f5195421066
redis_mode:standalone
os:Linux 5.3.0-1018-gcp x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:7.3.0
process_id:10603
run_id:e101ca0ece4d57a852d21144923aa5044c72edae
tcp_port:7001
uptime_in_seconds:1219969
uptime_in_days:14
hz:10
configured_hz:10
lru_clock:11170940
executable:/usr/bin/keydb-server
config_file:/etc/keydb/keydb.conf

# Clients
connected_clients:26
client_recent_max_input_buffer:80650
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:0
thread_0_clients:1
thread_1_clients:18
thread_2_clients:7
thread_3_clients:0

# Memory
used_memory:132721567520
used_memory_human:123.61G
used_memory_rss:147845615616
used_memory_rss_human:137.69G
used_memory_peak:143223145848
used_memory_peak_human:133.39G
used_memory_peak_perc:92.67%
used_memory_overhead:16522727116
used_memory_startup:9073784
used_memory_dataset:116198840404
used_memory_dataset_perc:87.56%
allocator_allocated:132723082720
allocator_active:145995980800
allocator_resident:147891363840
total_system_memory:219545796608
total_system_memory_human:204.47G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:193273528320
maxmemory_human:180.00G
maxmemory_policy:allkeys-lru
allocator_frag_ratio:1.10
allocator_frag_bytes:13272898080
allocator_rss_ratio:1.01
allocator_rss_bytes:1895383040
rss_overhead_ratio:1.00
rss_overhead_bytes:-45748224
mem_fragmentation_ratio:1.11
mem_fragmentation_bytes:15124049632
mem_not_counted_for_evict:0
mem_replication_backlog:58720256
mem_clients_slaves:0
mem_clients_normal:643468
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:1
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:26691339
rdb_bgsave_in_progress:0
rdb_last_save_time:1603786491
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:1686233088
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:186463
total_commands_processed:1896397112
instantaneous_ops_per_sec:2412
total_net_input_bytes:1922008408981
total_net_output_bytes:3988202435440
instantaneous_input_kbps:1893.15
instantaneous_output_kbps:4027.87
rejected_connections:0
sync_full:4
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:959116873
keyspace_misses:57646450
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:3001586
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:3
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0

# Replication
role:active-replica
master_global_link_status:up
master_host:10.22.233.157
master_port:7001
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:978346192
slave_priority:100
slave_read_only:0
connected_slaves:0
master_replid:9cbfd600d9f044feedd560141607eb4033068b68
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1017294483
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:52428800
repl_backlog_first_byte_offset:964865684
repl_backlog_histlen:52428800

# CPU
used_cpu_sys:24260.210691
used_cpu_user:67555.210273
used_cpu_sys_children:418.000992
used_cpu_user_children:8562.139188
server_threads:4
long_lock_waits:13235

# Modules

# Commandstats
cmdstat_dbsize:calls=5,usec=7,usec_per_call=1.40
cmdstat_set:calls=876104985,usec=3920669933,usec_per_call=4.48
cmdstat_replicaof:calls=1,usec=652,usec_per_call=652.00
cmdstat_config:calls=60999,usec=5810743,usec_per_call=95.26
cmdstat_get:calls=1016763323,usec=22421341376,usec_per_call=22.05
cmdstat_psync:calls=4,usec=609,usec_per_call=152.25
cmdstat_replconf:calls=1147640,usec=1813513,usec_per_call=1.58
cmdstat_del:calls=267052,usec=415409283,usec_per_call=1555.54
cmdstat_scan:calls=19298,usec=71326514,usec_per_call=3696.06
cmdstat_ping:calls=1285851,usec=936137,usec_per_call=0.73
cmdstat_latency:calls=60998,usec=154908,usec_per_call=2.54
cmdstat_command:calls=3,usec=2454,usec_per_call=818.00
cmdstat_client:calls=60998,usec=156672,usec_per_call=2.57
cmdstat_rreplay:calls=255400,usec=381776527,usec_per_call=1494.82
cmdstat_select:calls=17,usec=40,usec_per_call=2.35
cmdstat_slowlog:calls=121996,usec=572386,usec_per_call=4.69
cmdstat_auth:calls=186463,usec=1123128,usec_per_call=6.02
cmdstat_info:calls=62079,usec=6642961,usec_per_call=107.01

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=165327252,expires=0,avg_ttl=0
db1:keys=138655703,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------
id=186460 addr=10.139.58.12:38436 fd=101 name= age=107 idle=18 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186462 addr=10.139.44.2:52536 fd=98 name= age=79 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=set user=default
id=186407 addr=10.139.44.3:36236 fd=89 name= age=311 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get user=default
id=186360 addr=10.139.58.13:44052 fd=100 name= age=635 idle=2 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186455 addr=10.139.31.15:49680 fd=104 name= age=145 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get user=default
id=186361 addr=10.139.58.13:44060 fd=105 name= age=635 idle=2 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186430 addr=10.139.58.12:46836 fd=121 name= age=220 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get user=default
id=186378 addr=10.139.31.15:51792 fd=113 name= age=538 idle=6 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186470 addr=10.139.44.2:54444 fd=116 name= age=75 idle=19 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186475 addr=10.139.44.3:55908 fd=103 name= age=11 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=get user=default
id=186394 addr=10.139.58.13:50836 fd=110 name= age=452 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=set user=default
id=186464 addr=10.139.31.14:58236 fd=99 name= age=75 idle=16 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186468 addr=10.139.44.2:54422 fd=111 name= age=75 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=get user=default
id=186469 addr=10.139.44.2:54424 fd=114 name= age=75 idle=19 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186449 addr=10.139.31.14:37538 fd=92 name= age=202 idle=16 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=180395 addr=10.22.233.157:7001 fd=91 name= age=37339 idle=0 flags=M db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=rreplay user=(superuser)
id=186465 addr=10.139.44.2:54280 fd=106 name= age=75 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=6801 qbuf-free=25967 obl=0 oll=0 omem=0 events=r cmd=set user=default
id=186466 addr=10.139.44.2:54282 fd=107 name= age=75 idle=8 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=get user=default
id=186467 addr=10.139.58.13:35906 fd=109 name= age=75 idle=2 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186347 addr=10.139.31.15:41404 fd=97 name= age=765 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=set user=default
id=186446 addr=10.139.31.14:51018 fd=131 name= age=220 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=52 qbuf-free=32716 obl=0 oll=0 omem=0 events=r cmd=get user=default
id=186454 addr=10.139.58.12:33718 fd=102 name= age=146 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get user=default
id=186424 addr=10.139.44.3:44902 fd=115 name= age=220 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=set user=default
id=186442 addr=10.139.44.2:45934 fd=125 name= age=220 idle=19 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186427 addr=10.139.44.3:44928 fd=119 name= age=220 idle=11 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default
id=186463 addr=10.139.31.14:57946 fd=93 name= age=75 idle=16 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping user=default

------ REGISTERS ------
10603:11460:S 10 Nov 2020 11:07:41.131 # 
RAX:0000000000000008 RBX:0000000000000008
RCX:00007fb65b3fdb90 RDX:0000000000000001
RDI:0000000000000000 RSI:00007fb65ff9a840
RBP:0000000000000008 RSP:00007fb65b3fcf30
R8 :00007fcf80000000 R9 :0000000000000000
R10:0000000000000006 R11:00007fd89ce00bd0
R12:00007fb65b3fcf50 R13:00007fcfb0725910
R14:0000000000000000 R15:0000000040000000
RIP:000055598527e30d EFL:0000000000010202
CSGSFS:002b000000000033
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf3f) -> 000000003a4b5f42
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf3e) -> 000000005faa747c
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf3d) -> 00005559851d9f02
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf3c) -> 000000000fffffff
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf3b) -> 0000555985281de9
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf3a) -> 0000000008000000
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf39) -> 0000555985281de0
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf38) -> 00007fd89ccaa000
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf37) -> 000000000fffffff
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf36) -> 0000000000000000
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf35) -> 7c6191ea517da400
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf34) -> 0000000000000001
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf33) -> 0000555985281b28
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf32) -> 00007fcb44f03af1
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf31) -> 00007fd89ccaa000
10603:11460:S 10 Nov 2020 11:07:41.131 # (00007fb65b3fcf30) -> 00007fcb44f11558

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
10603:11460:S 10 Nov 2020 11:07:41.132 # Bio thread for job type #0 terminated
10603:11460:S 10 Nov 2020 11:07:41.132 # Bio thread for job type #1 terminated
10603:11460:S 10 Nov 2020 11:07:41.133 # Bio thread for job type #2 terminated
*** Preparing to test memory region 55598559b000 (2281472 bytes)
*** Preparing to test memory region 555985e8f000 (454656 bytes)
*** Preparing to test memory region 555985efe000 (81920 bytes)
*** Preparing to test memory region 7fb4dcb00000 (5816451072 bytes)
*** Preparing to test memory region 7fb637680000 (211423232 bytes)
*** Preparing to test memory region 7fb64807d000 (39321600 bytes)
*** Preparing to test memory region 7fb64a5fe000 (8388608 bytes)
*** Preparing to test memory region 7fb64adff000 (8388608 bytes)
*** Preparing to test memory region 7fb64b5ff000 (2097152 bytes)
*** Preparing to test memory region 7fb64b800000 (8388608 bytes)
*** Preparing to test memory region 7fb64c000000 (135168 bytes)
*** Preparing to test memory region 7fb650000000 (135168 bytes)
*** Preparing to test memory region 7fb654000000 (135168 bytes)
*** Preparing to test memory region 7fb658000000 (10485760 bytes)
*** Preparing to test memory region 7fb658bfc000 (8388608 bytes)
*** Preparing to test memory region 7fb6593fd000 (8388608 bytes)
*** Preparing to test memory region 7fb659bfe000 (8388608 bytes)
*** Preparing to test memory region 7fb65a3ff000 (8388608 bytes)
*** Preparing to test memory region 7fb65ac00000 (8388608 bytes)
*** Preparing to test memory region 7fb65b400000 (146981130240 bytes)
*** Preparing to test memory region 7fd898000000 (18874368 bytes)
*** Preparing to test memory region 7fd899341000 (8388608 bytes)
*** Preparing to test memory region 7fd899b42000 (8388608 bytes)
*** Preparing to test memory region 7fd89a343000 (8388608 bytes)
*** Preparing to test memory region 7fd89ad4b000 (8388608 bytes)
*** Preparing to test memory region 7fd89b757000 (24576 bytes)
*** Preparing to test memory region 7fd89b975000 (8192 bytes)
*** Preparing to test memory region 7fd89bd8d000 (9437184 bytes)
*** Preparing to test memory region 7fd89c800000 (8388608 bytes)
*** Preparing to test memory region 7fd89d367000 (188416 bytes)
*** Preparing to test memory region 7fd89d69d000 (4096 bytes)
*** Preparing to test memory region 7fd89d8e7000 (4096 bytes)
*** Preparing to test memory region 7fd89e373000 (4096 bytes)
*** Preparing to test memory region 7fd89e8a2000 (4096 bytes)
*** Preparing to test memory region 7fd89f1f1000 (4096 bytes)
*** Preparing to test memory region 7fd89f867000 (8192 bytes)
*** Preparing to test memory region 7fd89fea9000 (4096 bytes)
*** Preparing to test memory region 7fd8a0bd0000 (4096 bytes)
*** Preparing to test memory region 7fd8a15ca000 (8192 bytes)
*** Preparing to test memory region 7fd8a2470000 (16384 bytes)
*** Preparing to test memory region 7fd8a268f000 (16384 bytes)
*** Preparing to test memory region 7fd8a2fce000 (16384 bytes)
*** Preparing to test memory region 7fd8a349a000 (12288 bytes)
*** Preparing to test memory region 7fd8a41c1000 (114688 bytes)
*** Preparing to test memory region 7fd8a41e5000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.23319:23319:C 10 Nov 2020 11:08:27.957 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.

To Reproduce
Unsure how to reproduce. The actions I'm doing are as follows:
1). Running 2 keydb in active-replica mode (no multi-master)
2). active-defrag is enabled
3). Connected to server A, delete keys from db1 at 100 keys per delete call (while serving get calls from db0).
4). Eventually keydb server A crashes.
5). keydb server B also crashed a little while later with same stacktrace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions