Skip to content

feat: Add comprehensive NCCL operations support for CubeCL CUDA runtime #792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Use-AIrs
Copy link
Contributor

Ok, so I made the first NCCL implementation which works now. The design isn't for bigger clusters since it's a single-threaded async approach. For small device counts, this works but not really for HPC - I'll be able to make this more interesting for HPC in a few weeks... But this is a usable approach without grouping, just a simple straightforward approach for now.

Tests work, have a short doc entry with that, and no more memory leaks :)

@Use-AIrs
Copy link
Contributor Author

So took a step back to the grouping abstraction as discussed. Hope thats more the direction :)

Comment on lines +310 to +313
pub async fn barrier(&self) {
for (_, device) in &self.map {
device.client.sync().await;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should await in parallel instead of sequential using join_all or something equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair

let expected_recv_size = send_data.len() * nccl_op.device_count;
let cu_device = CudaDevice { index: 0 };
let client = <CudaRuntime as Runtime>::client(&cu_device);
let send_handle = create_test_data(&client, send_data.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to test the send/recv on the same device no?

Copy link
Contributor Author

@Use-AIrs Use-AIrs Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just got one gpu ^^ Getting my Cluster later this week. But had to test somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants