Skip to content

Error in mlx5dv_create_qp in the DC transport #5749

@lyu

Description

@lyu

Describe the bug

ucx_info and ucx_perftest reports dc_mlx5.c:329 UCX ERROR mlx5dv_create_qp(mlx5_0:1, DCI): failed: Invalid argument.

Steps to Reproduce

UCX version: UCT version=1.10.0 revision c7add93
UCX build config: --prefix=$PREFIX --enable-debug --enable-assertions --enable-params-check --enable-frame-pointer --enable-backtrace-detail

Setup and versions

  • lsb_release -a:
LSB Version:	:core-4.1-aarch64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 8.1.1911 (Core) 
Release:	8.1.1911
Codename:	Core
  • ofed_info -s: MLNX_OFED_LINUX-5.1-0.6.6.0
  • rpm -q rdma-core: rdma-core-51mlnx1-1.51066.aarch64
  • rpm -q libibverbs: libibverbs-51mlnx1-1.51066.aarch64

Additional information (depending on the issue)

For ucx_info -d, this happens when it tries to print info about the dc_mlx5 transport.
For ucx_perftest, it happens when running any UCP test without any environment variable set.

All issues go away if I add --without-dc to the configure script.

This doesn't happen with UCX 1.9.0, dc transport will be enabled and work correctly.

This also doesn't happen when built against MLNX_OFED_LINUX-4.5-1.0.1.0 on another ThunderX2 machine, but it looks like dc is automatically disabled there.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions