feat: Add comprehensive NCCL operations support for CubeCL CUDA runtime #792

Use-AIrs · 2025-07-27T08:30:36Z

Ok, so I made the first NCCL implementation which works now. The design isn't for bigger clusters since it's a single-threaded async approach. For small device counts, this works but not really for HPC - I'll be able to make this more interesting for HPC in a few weeks... But this is a usable approach without grouping, just a simple straightforward approach for now.

Tests work, have a short doc entry with that, and no more memory leaks :)

Use-AIrs · 2025-07-27T08:36:39Z

So took a step back to the grouping abstraction as discussed. Hope thats more the direction :)

nathanielsimard · 2025-07-28T14:20:26Z

crates/cubecl-cuda/src/compute/nccl.rs

+    pub async fn barrier(&self) {
+        for (_, device) in &self.map {
+            device.client.sync().await;
+        }


We should await in parallel instead of sequential using join_all or something equivalent.

nathanielsimard · 2025-07-28T14:21:53Z

crates/cubecl-cuda/src/compute/nccl.rs

+            let expected_recv_size = send_data.len() * nccl_op.device_count;
+            let cu_device = CudaDevice { index: 0 };
+            let client = <CudaRuntime as Runtime>::client(&cu_device);
+            let send_handle = create_test_data(&client, send_data.clone());


It's weird to test the send/recv on the same device no?

Just got one gpu ^^ Getting my Cluster later this week. But had to test somehow.

feat: Add comprehensive NCCL operations support for CubeCL CUDA runtime

19f218d

Added disclaimer

5de0d51

nathanielsimard reviewed Jul 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add comprehensive NCCL operations support for CubeCL CUDA runtime #792

feat: Add comprehensive NCCL operations support for CubeCL CUDA runtime #792

Uh oh!

Use-AIrs commented Jul 27, 2025

Uh oh!

Use-AIrs commented Jul 27, 2025

Uh oh!

nathanielsimard Jul 28, 2025

Uh oh!

Use-AIrs Jul 28, 2025

Uh oh!

nathanielsimard Jul 28, 2025

Uh oh!

Use-AIrs Jul 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: Add comprehensive NCCL operations support for CubeCL CUDA runtime #792

Are you sure you want to change the base?

feat: Add comprehensive NCCL operations support for CubeCL CUDA runtime #792

Uh oh!

Conversation

Use-AIrs commented Jul 27, 2025

Uh oh!

Use-AIrs commented Jul 27, 2025

Uh oh!

nathanielsimard Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Use-AIrs Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

nathanielsimard Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Use-AIrs Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Use-AIrs Jul 28, 2025 •

edited

Loading