Skip to content

Conversation

@tstruk
Copy link

@tstruk tstruk commented Jun 30, 2023

No need to call MPI_Comm_split if the NCCL_TESTS_SPLIT_MASK variable is not set.
It only causes unnecessary traffic with no real effect, and in some cases might cause the ranks be invalid.

No need to call MPI_Comm_split if the NCCL_TESTS_SPLIT_MASK variable
is not set. It only causes unnecessary traffic with no real effect,
and in some cases might cause the ranks be invalid.

Signed-off-by: Tadeusz Struk <[email protected]>
@sjeaugey
Copy link
Member

sjeaugey commented Jul 3, 2023

in some cases might cause the ranks be invalid

Can you say more about this?

@tstruk
Copy link
Author

tstruk commented Jul 3, 2023

Can you say more about this?

Hi,
Yes, that is running on MPICH 4.1.2, and libfabric 1.18, with a custom provider, NCCL version 2.18.3, across 4 nodes.
Sometimes what I see is that after calling MPI_Comm_split() with the parameter color equal to 0, the rank in one of the nodes is not correct.
The color is 0 because I don't set the NCCL_TESTS_SPLIT_MASK environment variable.
Here is the debug code I added:

diff --git a/src/common.h b/src/common.h
index 20fa461..d2359ea 100644
--- a/src/common.h
+++ b/src/common.h
@@ -282,5 +282,11 @@ static int ncclstringtoop (char *str) {
 extern int is_main_proc;
 extern thread_local int is_main_thread;
 #define PRINT if (is_main_thread) printf
+#define PRINT_ALL_NODES(STR, ...)      \
+({                                     \
+  char hn[256];                        \
+  gethostname(hn, 256);                \
+  printf("%s: " STR "\n", hn, ## __VA_ARGS__); \
+})

and

diff --git a/src/common.cu b/src/common.cu
index 48a629c..a7531f6 100644
--- a/src/common.cu
+++ b/src/common.cu
@@ -852,6 +852,8 @@ testResult_t run() {
 #ifdef MPI_SUPPORT
   MPI_Comm_size(MPI_COMM_WORLD, &totalProcs);
   MPI_Comm_rank(MPI_COMM_WORLD, &proc);
+PRINT_ALL_NODES("proc %d, totalProcs %d\n", proc, totalProcs);
+
   uint64_t hostHashs[totalProcs];
   hostHashs[proc] = getHostHash(hostname);
   MPI_Allgather(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, hostHashs, sizeof(uint64_t), MPI_BYTE, MPI_COMM_WORLD);
@@ -867,6 +869,8 @@ testResult_t run() {
   MPI_Comm_split(MPI_COMM_WORLD, color, proc, &mpi_comm);
   MPI_Comm_size(mpi_comm, &ncclProcs);
   MPI_Comm_rank(mpi_comm, &ncclProc);
+PRINT_ALL_NODES("ncclProc %d, ncclProcs %d\n", ncclProc, ncclProcs);
+
 #endif
   is_main_thread = is_main_proc = (proc == 0) ? 1 : 0;

and here is what I see:

Node1: proc 1, totalProcs 4 
Node3: proc 3, totalProcs 4 
Node2: proc 2, totalProcs 4 
Node0: proc 0, totalProcs 4 
Node1: ncclProc 1, ncclProcs 4 
Node3: ncclProc 3, ncclProcs 4 
Node2: ncclProc 0, ncclProcs 2 
Node0: ncclProc 0, ncclProcs 4 

The MPICH documentation [1] says that the color should be "non-negative integer" so 0 should be ok, but it's not.
When this happens the test hangs after the NCCL init phase.
After the fix is applied everything works as expected.

[1] - https://www.mpich.org/static/docs/v4.1.2/www3/MPI_Comm_split.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants