-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Description of problem:
Setup of 2 node mirrored volumes with clients installed on both nodes. When one of the node becomes faulty, the node is removed and replaced with a new node with the same name/IP. While adding brick, the active client crashes. The issue occurs randomly when ssl is enabled on IO. It is not seen in non-ssl setups.
The exact command to reproduce the issue:
gluster volume add-brick efa_logs replica 2 10.18.120.135:/apps/opt/efa/logs force
The full output of the command that failed:
Expected results:
add-brick should be successful
Mandatory info:
- The output of the gluster volume info
command:
Status of volume: efa_certs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.18.120.136:/apps/opt/efa/certs 52847 0 Y 34686
Brick 10.18.120.135:/apps/opt/efa/certs 54321 0 Y 33999
Self-heal Daemon on localhost N/A N/A Y 150192
Self-heal Daemon on 10.18.120.135 N/A N/A Y 34015
Task Status of Volume efa_certs
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: efa_logs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.18.120.136:/apps/opt/efa/logs 56910 0 Y 34750
Brick 10.18.120.135:/apps/opt/efa/logs 56796 0 Y 34064
Self-heal Daemon on localhost N/A N/A Y 150192
Self-heal Daemon on 10.18.120.135 N/A N/A Y 34015
Task Status of Volume efa_logs
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: efa_misc
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.18.120.136:/apps/opt/efa/misc 55691 0 Y 34799
Brick 10.18.120.135:/apps/opt/efa/misc 58871 0 Y 34167
Self-heal Daemon on localhost N/A N/A Y 150192
Self-heal Daemon on 10.18.120.135 N/A N/A Y 34015
Task Status of Volume efa_misc
------------------------------------------------------------------------------
There are no active volume tasks
- The output of the gluster volume status
command:
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.18.120.135:/apps/opt/efa/logs
Brick2: 10.18.120.136:/apps/opt/efa/logs
Options Reconfigured:
ssl.ca-list: /apps/efadata/glusterfs/glusterfs.extreme-ca-chain.pem
ssl.own-cert: /apps/efadata/glusterfs/glusterfs.pem
ssl.private-key: /apps/efadata/glusterfs/glusterfs.key.pem
ssl.cipher-list: HIGH:!SSLv2:!SSLv3:!TLSv1:!TLSv1.1:TLSv1.2:!3DES:!RC4:!aNULL:!ADH
auth.ssl-allow: 10.18.120.135,10.18.120.136
server.ssl: on
client.ssl: on
ssl.certificate-depth: 3
network.ping-timeout: 2
performance.open-behind: on
cluster.favorite-child-policy: mtime
storage.owner-gid: 1001
storage.owner-uid: 0
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
- The output of the gluster volume heal
command:
**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/
**- Is there any crash ? Provide the backtrace and coredump
(gdb) bt
#0 0x00007fa6f731bbad in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#1 0x00007fa6f731fe1e in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#2 0x00007fa6f731d6d0 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#3 0x00007fa6f7324c45 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#4 0x00007fa6f732fa3f in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#5 0x00007fa6f732fb47 in SSL_read () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#6 0x00007fa6f739dc94 in ssl_do (buf=<optimized out>, len=<optimized out>, func=<optimized out>, priv=<optimized out>, priv=<optimized out>) at socket.c:246
#7 0x00007fa6f739de36 in __socket_ssl_readv (opvector=opvector@entry=0x7fa6f6abedd0, opcount=opcount@entry=1, this=<optimized out>, this=<optimized out>) at socket.c:552
#8 0x00007fa6f739e35b in __socket_ssl_read (count=<optimized out>, buf=<optimized out>, this=0x555685ba1b98) at socket.c:572
#9 __socket_cached_read (opcount=1, opvector=0x555685699338, this=0x555685ba1b98) at socket.c:610
#10 __socket_rwv (this=this@entry=0x555685ba1b98, vector=<optimized out>, count=count@entry=1, pending_vector=pending_vector@entry=0x5556856993a8, pending_count=pending_count@entry=0x5556856993b4, bytes=bytes@entry=0x7fa6f6abeea0,
write=0) at socket.c:721
#11 0x00007fa6f73a0438 in __socket_readv (bytes=0x7fa6f6abeea0, pending_count=0x5556856993b4, pending_vector=0x5556856993a8, count=1, vector=<optimized out>, this=0x555685ba1b98) at socket.c:2102
#12 __socket_read_frag (this=0x555685ba1b98) at socket.c:2102
#13 socket_proto_state_machine (pollin=<synthetic pointer>, this=0x555685ba1b98) at socket.c:2262
#14 socket_event_poll_in (notify_handled=true, this=0x555685ba1b98) at socket.c:2384
#15 socket_event_handler (event_thread_died=0, poll_err=0, poll_out=<optimized out>, poll_in=<optimized out>, data=0x555685ba1b98, gen=13, idx=2, fd=<optimized out>) at socket.c:2790
#16 socket_event_handler (fd=fd@entry=6, idx=idx@entry=2, gen=gen@entry=13, data=data@entry=0x555685ba1b98, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0) at socket.c:2710
#17 0x00007fa6fbade119 in event_dispatch_epoll_handler (event=0x7fa6f6abf054, event_pool=0x555685006018) at event-epoll.c:614
#18 event_dispatch_epoll_worker (data=0x555685036828) at event-epoll.c:725
#19 0x00007fa6fb9fa609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#20 0x00007fa6fb74b133 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) f 5
#5 0x00007fa6f732fb47 in SSL_read () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
(gdb) info locals
No symbol table info available.
(gdb) f 9
#9 __socket_cached_read (opcount=1, opvector=0x555685699338, this=0x555685ba1b98) at socket.c:610
610 socket.c: No such file or directory.
(gdb) info ocals
Undefined info command: "ocals". Try "help info".
(gdb) info locals
ret = -1
priv = 0x555685699218
in = 0x555685699318
req_len = 8
priv = <optimized out>
in = <optimized out>
req_len = <optimized out>
ret = <optimized out>
(gdb) l
605 in socket.c
(gdb) f 7
#7 0x00007fa6f739de36 in __socket_ssl_readv (opvector=opvector@entry=0x7fa6f6abedd0, opcount=opcount@entry=1, this=<optimized out>, this=<optimized out>) at socket.c:552
552 in socket.c
(gdb) info locals
priv = 0x555685699218
sock = <optimized out>
ret = -1
__FUNCTION__ = "__socket_ssl_readv"
(gdb) f 15
#15 socket_event_handler (event_thread_died=0, poll_err=0, poll_out=<optimized out>, poll_in=<optimized out>, data=0x555685ba1b98, gen=13, idx=2, fd=<optimized out>) at socket.c:2790
2790 in socket.c
(gdb) l
2785 in socket.c
(gdb) info locals
this = <optimized out>
ret = <optimized out>
ctx = <optimized out>
notify_handled = <optimized out>
priv = 0x555685699218
socket_closed = <optimized out>
this = <optimized out>
priv = <optimized out>
ret = <optimized out>
ctx = <optimized out>
socket_closed = <optimized out>
notify_handled = <optimized out>
__FUNCTION__ = "socket_event_handler"
sock_type = <optimized out>
sa = <optimized out>
(gdb)
Additional info:
- The operating system / glusterfs version:
It is reproducible with gluster version 9.6 and 11.0 on Ubuntu setup installed with Debian files.
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration