io_uring errors when starting glusterd

**Description of problem:**

We have a three server glusterfs setup. When starting glusterd the service frequently fails to start with the following error logged:

`C [gf-io-uring.c:612:gf_io_uring_cq_process_some] (-->/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7ff76) [0x7f194fc22f76] -->/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8bf15) [0x7f194fc2ef15] -->/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8bdd5) [0x7f194fc2edd5] ) 0-: Assertion failed:`

The service will typically start and run after several attempts. It will run stably for about 2 week then crash. 

All three servers are identical down to the bios versions. 

**The exact command to reproduce the issue**:

`$ sudo systemctl start glusterd`

**The full output of the command that failed**:

```
Job for glusterd.service failed because the control process exited with error code.                                                                                          
See "systemctl status glusterd.service" and "journalctl -xeu glusterd.service" for details. 
```

On running journalctl -xeu glusterd.service this is the output:
```
Jul 31 14:53:18 srv-003 glusterd[1582227]: [2023-07-31 14:53:18.894347 +0000] C [gf-io-uring.c:612:gf_io_uring_cq_process_some] (-->/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7ff76) [0x7f194fc22f76] -->/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8bf15) [0x7f194fc2ef15] -->/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8bdd5) [0x7f194fc2edd5] ) 0-: Assertion failed:
Jul 31 14:53:18 srv-003 glusterd[1582227]: pending frames:
Jul 31 14:53:18 srv-003 glusterd[1582227]: patchset: git://git.gluster.org/glusterfs.git
Jul 31 14:53:18 srv-003 glusterd[1582227]: signal received: 6
Jul 31 14:53:18 srv-003 glusterd[1582227]: time of crash:
Jul 31 14:53:18 srv-003 glusterd[1582227]: 2023-07-31 14:53:18 +0000
Jul 31 14:53:18 srv-003 glusterd[1582227]: configuration details:
Jul 31 14:53:18 srv-003 glusterd[1582227]: argp 1
Jul 31 14:53:18 srv-003 glusterd[1582227]: backtrace 1
Jul 31 14:53:18 srv-003 glusterd[1582227]: dlfcn 1
```

**Expected results:**
No output and glusterd running


**Mandatory info:**
**- The output of the `gluster volume info` command**:

```
Volume Name: vol03
Type: Distributed-Disperse
Volume ID: 49f0d0cd-3335-4e08-ae1e-fb56d2a7d685
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srv-001:/srv/glusterfs/vol03/brick0
Brick2: srv-002:/srv/glusterfs/vol03/brick0
Brick3: srv-003:/srv/glusterfs/vol03/brick0
Options Reconfigured:
performance.cache-size: 1GB
storage.linux-io_uring: off
server.event-threads: 4
client.event-threads: 4
performance.write-behind: off
performance.parallel-readdir: on
performance.readdir-ahead: on
performance.nl-cache-timeout: 600
performance.nl-cache: on
network.inode-lru-limit: 200000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-samba-metadata: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
```

**- The output of the `gluster volume status` command**:

** This is after the glusterd service has successfully started and is running! 

```
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick srv-001:/srv/glusterfs
/vol03/brick0                               54477     0          Y       5564 
Brick srv-002:/srv/glusterfs
/vol03/brick0                               58095     0          Y       4288 
Brick srv-003:/srv/glusterfs
/vol03/brick0                               50589     0          Y       5319 
Self-heal Daemon on localhost               N/A       N/A        Y       1582991
Self-heal Daemon on srv-002  N/A       N/A        Y       4323 
Self-heal Daemon on srv-001  N/A       N/A        Y       7260 
 
Task Status of Volume vol03
------------------------------------------------------------------------------
There are no active volume tasks

```

**- The output of the `gluster volume heal` command**:

```
Status: Connected
Number of entries: 0

Brick srv-002:/srv/glusterfs/vol03/brick0
Status: Connected
Number of entries: 0

Brick srv-003:/srv/glusterfs/vol03/brick0
Status: Connected
Number of entries: 0
```

**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/

**- Is there any crash ? Provide the backtrace and coredump

Not sure how to do this, happy to if someone can point me in the right direction for what is needed. 

**Additional info:**

Each server has mostly identical hardware composed of the following:
CPU: AMD Ryzen 7 5700G
RAM: 2x servers have 16Gb and one has 32Gb (this is the only variance)
Storage:
   - 2x NVME drives per server
   - 2TB Samsung 970 EVO Plus

The entire storage stack:
- EFI partition table per drive
- primary drive is boot drive 
- primary drive has 1.8T LVM partition (after system boot portions)
- second drive has matching 1.8T LVM partition
- 1x volume group contains these partitions
- A 1.15TiB logical volume in LVM RAID 1 across the two drives hosts the gluster brick on each server
- LV is encrypted using cryptsetup with LUCS 
- encrypted LV is then mounted using `/dev/mapper`
- The encrypted partition is then formatted with an `XFS` file system 
- This is then hosted using `glusterd`

This is a complex setup driven by a clients security policies though the RAID setup can be removed.

</details>

**- The operating system / glusterfs version**:

```
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 23.04
Release:        23.04
Codename:       lunar
```


```
# glusterfs --version
glusterfs 11.0
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
```

Last week we were running `glusterfs 10.4` with exactly the same issues. Upgraded to 11.0 this weekend to see if that would provide a fix, there has been no change in behavior. 

**Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

io_uring errors when starting glusterd #4214

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

io_uring errors when starting glusterd #4214

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions