Skip to content

Conversation

amd-ivaganev
Copy link

Ultra & Unified CCL is a efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., IBGDA), with two key focuses:

  • Flexibility for high performance in fast-evolving ML workloads
  • Portability for connecting heterogeneous GPUs in ML workloads

For collectives, UCCL-collective serves as a drop-in replacement for NCCL/RCCL (e.g., requiring no changes to application code), and significantly outperforms them in both latency and throughput across various settings.

Motivation

Add UCCL as an external project to TheRock.

UCCL provides performant and easy to use network congestion control plugin for RCCL collectives that with significant benefits for training and inference at scale. It also contains a performant transfer engine that is actively being developed by academic partners.

Lowering the barrier to clients' use of these has obvious benefits for AMD.

Technical Details

This PR is small and localized -- it only adds a python build script for the UCCL external project and a README.md file.

The majority of the work has already been pushed and merged into the upstream UCCL project and consisted of modifying its build scripts to accommodate a new therock build target parameterized in a way that makes inclusion here (as an external project) easy.

Test Plan

The build script and all its options have been tested with the (already merged) upstream functionality.

Test Result

Tests of cloning and building succeed and produce manylinux wheels of UCCL dependent on TheRock wheels.

Submission Checklist

Ultra & Unified CCL is a efficient communication library for GPUs,
covering collectives, P2P (e.g., KV cache transfer, RL weight
transfer), and EP (e.g., IBGDA), with two key focuses:

- Flexibility for high performance in fast-evolving ML workloads
- Portability for connecting heterogeneous GPUs in ML workloads

For collectives, UCCL-collective serves as a drop-in replacement for
NCCL/RCCL (e.g., requiring no changes to application code), and
significantly outperforms them in both latency and throughput across
various settings.
Copy link
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. @ScottTodd is on vacation this I gave it a very quick first look. Also adding @stellaraccident to have a higher level look and for visibly.

When building pytroch, we do a checkout separate from building the wheels. This might be beneficial here as well as it allows to not clone the repo every time and to iterate separately on the build if one wants to. Wdyt?

Comment on lines +20 to +21
UCCL builds in an Ubuntu 22.04 docker container and outputs a
manylinux wheel to a directory named `wheelbase-therock` in the repo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ubuntu 22.04 vs. manylinux wheel doesn't sound right.

Copy link
Author

@amd-ivaganev amd-ivaganev Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that quite a bit, actually. The upstream UCCL project has nvidia, rocm targets as well and some flavor of Ubuntu 22.04 container with those frameworks pre-installed and they've worked out apt dependences for their code to install in the prep for their build container image. Using a plain Ubuntu 22.04 image as the base for therock target was the least disruptive thing to propose for their repo.

They build flow already has an auditwheel-based post-processing step that converts the artifact wheel to a manylinux one and I've tested it in combo with TheRock manylinux packages. Not being a python packaging expert I wasn't quite sure if the upheaval of forcing their build into the same container image would buy us anything since the output in both cases is a manylinux wheel.

Also, it's not easy to figure out the correct matching PyPA image since its specifics are hidden in the checkout scripts' python and marrying those checkout scripts with the upstream UCCL build scripts was a bad fit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the combination of a manylinux image and auditwheel is required to build the most portable of artifacts. Just subbing in Ubuntu is rarely a sufficient replacement for using a RHEL/Alma DevToolset (which is what manylinux is based on). There are several things done in those toolchains that increase their compatibility. While it is possible to emulate that on other Linux's, it can get quite nit-picky to audit properly. Some of those compatibility guarantees come from how RHEL devtoolsets surgically extend compatible glibc/libstdc++ builds of the libraries -- it isn't just something you can "convert" to with manylinux. It is rooted in very specific ways those toolchains are set up.

For maximum compatibility, I'd recommend using our manylinux build docker (or a derivative). That is mostly a stock manylinux docker with a couple of alterations:

  • We install some common sense dev tools.
  • Downgrades to devtoolset 12. Upstream manylinux rolls to the newest devtoolset eagerly but we stay pinned to a version that we coordinate ROCm and PyTorch wide.

Also, the manylinux image already has all Python versions installed in it and can be used to build for any version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stellaraccident I figured out how to adapt their build to our manylinux container and tracked down RHEL equivalentts to some extra packages that they need for their build, but now I am running into another problem.

Trying to load the resulting package on Ubuntu 22.04 runs into a missing library libgflags.so.2.1, though the Ubuntu host has libgflags.so.2.2. I'm confused how manylinux python packaging is supposed to work with the inevitable shared object versioning mismatches across distributions.

Any advice?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auditwheel is supposed to surgically include such DSOs (the procedure it uses is to copy the dep into the wheel, change the SONAME to a unique name, and update all things that depend on it to use a relative RPATH and the updated SONAME). However, it is sometimes finicky.

For projects like ROCm, we vendor all of our deps so that there are no system dependencies pulled in (and then we don't use auditwheel at all). PyTorch has found it to be unreliable for the scale of what they are doing and use a variety of manual steps, last I checked. But for something of the scale of what you are doing here, I've typically found that auditwheel does the right thing.

You're going to want to unzip the resulting wheel and use readelf -d to check DT_NEEDED and RUNPATH entries of anything depending on libgflags to spot what it is doing.

@marbre marbre requested a review from stellaraccident October 6, 2025 08:33
@amd-ivaganev
Copy link
Author

Thanks for the PR. @ScottTodd is on vacation this I gave it a very quick first look. Also adding @stellaraccident to have a higher level look and for visibly.

Thanks, appreciate your jumping in so I don't have to wonder who to engage with.

When building pytroch, we do a checkout separate from building the wheels. This might be beneficial here as well as it allows to not clone the repo every time and to iterate separately on the build if one wants to. Wdyt?

That's a good point. The repo management scripts looked rather complicated but I guess I can use this as an excuse to brush up on my git skills. Ideally I could re-use those as-is since they're quite parameterized but I'm not sure if symlinking them is allowed or a no-no. Making a copy would inevitably lead to divergence and missing fixes at a later point.

I'd like to do "the right thing"... Advice? Thoughts?

@marbre
Copy link
Member

marbre commented Oct 6, 2025

Thanks for the PR. @ScottTodd is on vacation this I gave it a very quick first look. Also adding @stellaraccident to have a higher level look and for visibly.

Thanks, appreciate your jumping in so I don't have to wonder who to engage with.
👍

When building pytroch, we do a checkout separate from building the wheels. This might be beneficial here as well as it allows to not clone the repo every time and to iterate separately on the build if one wants to. Wdyt?

That's a good point. The repo management scripts looked rather complicated but I guess I can use this as an excuse to brush up on my git skills. Ideally I could re-use those as-is since they're quite parameterized but I'm not sure if symlinking them is allowed or a no-no. Making a copy would inevitably lead to divergence and missing fixes at a later point.

Scripts like https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/pytorch_torch_repo.py are indeed rather complicated but you only need a friction. torch is special in a way that we carried local patches (all gone now) + that some of the file need to be hipified. I think neither is needed for UCCL.

I'd like to do "the right thing"... Advice? Thoughts?

It can actually be a rather small script that just clones the repo cut out from the current build script. To add some other nice features, additional arguments like --gitrepo-origin (to eventually point to a fork), --repo-hashtag (to point to a specific ref/tag to checkout), --depth and others would be nice. To copy from pytorch_torch_repo,

  • --repo
  • --repo-name
  • --repo-hashtag
  • --gitrepo-origin
  • --depth
  • --jobs

should be good to go and everything that is needed from pytorch_torch_repo.py. Starting with a some duplicated code should be acceptable could think about reusing and / or moving to build_tools later if generic / reusable.

@stellaraccident
Copy link
Collaborator

Thanks for the PR. @ScottTodd is on vacation this I gave it a very quick first look. Also adding @stellaraccident to have a higher level look and for visibly.

Thanks, appreciate your jumping in so I don't have to wonder who to engage with.
👍

When building pytroch, we do a checkout separate from building the wheels. This might be beneficial here as well as it allows to not clone the repo every time and to iterate separately on the build if one wants to. Wdyt?

That's a good point. The repo management scripts looked rather complicated but I guess I can use this as an excuse to brush up on my git skills. Ideally I could re-use those as-is since they're quite parameterized but I'm not sure if symlinking them is allowed or a no-no. Making a copy would inevitably lead to divergence and missing fixes at a later point.

Scripts like https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/pytorch_torch_repo.py are indeed rather complicated but you only need a friction. torch is special in a way that we carried local patches (all gone now) + that some of the file need to be hipified. I think neither is needed for UCCL.

I'd like to do "the right thing"... Advice? Thoughts?

It can actually be a rather small script that just clones the repo cut out from the current build script. To add some other nice features, additional arguments like --gitrepo-origin (to eventually point to a fork), --repo-hashtag (to point to a specific ref/tag to checkout), --depth and others would be nice. To copy from pytorch_torch_repo,

  • --repo
  • --repo-name
  • --repo-hashtag
  • --gitrepo-origin
  • --depth
  • --jobs

should be good to go and everything that is needed from pytorch_torch_repo.py. Starting with a some duplicated code should be acceptable could think about reusing and / or moving to build_tools later if generic / reusable.

Yes, I doubt that pretty much any other project will have the complexity of what we have to deal with for pytorch. Best to not use that for inspiration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

3 participants