-
Notifications
You must be signed in to change notification settings - Fork 98
Add UCCL as an external build project. #1690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Ultra & Unified CCL is a efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., IBGDA), with two key focuses: - Flexibility for high performance in fast-evolving ML workloads - Portability for connecting heterogeneous GPUs in ML workloads For collectives, UCCL-collective serves as a drop-in replacement for NCCL/RCCL (e.g., requiring no changes to application code), and significantly outperforms them in both latency and throughput across various settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. @ScottTodd is on vacation this I gave it a very quick first look. Also adding @stellaraccident to have a higher level look and for visibly.
When building pytroch, we do a checkout separate from building the wheels. This might be beneficial here as well as it allows to not clone the repo every time and to iterate separately on the build if one wants to. Wdyt?
UCCL builds in an Ubuntu 22.04 docker container and outputs a | ||
manylinux wheel to a directory named `wheelbase-therock` in the repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ubuntu 22.04 vs. manylinux wheel doesn't sound right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that quite a bit, actually. The upstream UCCL project has nvidia, rocm targets as well and some flavor of Ubuntu 22.04 container with those frameworks pre-installed and they've worked out apt
dependences for their code to install in the prep for their build container image. Using a plain Ubuntu 22.04 image as the base for therock
target was the least disruptive thing to propose for their repo.
They build flow already has an auditwheel
-based post-processing step that converts the artifact wheel to a manylinux one and I've tested it in combo with TheRock manylinux packages. Not being a python packaging expert I wasn't quite sure if the upheaval of forcing their build into the same container image would buy us anything since the output in both cases is a manylinux wheel.
Also, it's not easy to figure out the correct matching PyPA image since its specifics are hidden in the checkout scripts' python and marrying those checkout scripts with the upstream UCCL build scripts was a bad fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the combination of a manylinux image and auditwheel is required to build the most portable of artifacts. Just subbing in Ubuntu is rarely a sufficient replacement for using a RHEL/Alma DevToolset (which is what manylinux is based on). There are several things done in those toolchains that increase their compatibility. While it is possible to emulate that on other Linux's, it can get quite nit-picky to audit properly. Some of those compatibility guarantees come from how RHEL devtoolsets surgically extend compatible glibc/libstdc++ builds of the libraries -- it isn't just something you can "convert" to with manylinux. It is rooted in very specific ways those toolchains are set up.
For maximum compatibility, I'd recommend using our manylinux build docker (or a derivative). That is mostly a stock manylinux docker with a couple of alterations:
- We install some common sense dev tools.
- Downgrades to devtoolset 12. Upstream manylinux rolls to the newest devtoolset eagerly but we stay pinned to a version that we coordinate ROCm and PyTorch wide.
Also, the manylinux image already has all Python versions installed in it and can be used to build for any version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stellaraccident I figured out how to adapt their build to our manylinux container and tracked down RHEL equivalentts to some extra packages that they need for their build, but now I am running into another problem.
Trying to load the resulting package on Ubuntu 22.04 runs into a missing library libgflags.so.2.1
, though the Ubuntu host has libgflags.so.2.2
. I'm confused how manylinux python packaging is supposed to work with the inevitable shared object versioning mismatches across distributions.
Any advice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auditwheel is supposed to surgically include such DSOs (the procedure it uses is to copy the dep into the wheel, change the SONAME to a unique name, and update all things that depend on it to use a relative RPATH and the updated SONAME). However, it is sometimes finicky.
For projects like ROCm, we vendor all of our deps so that there are no system dependencies pulled in (and then we don't use auditwheel at all). PyTorch has found it to be unreliable for the scale of what they are doing and use a variety of manual steps, last I checked. But for something of the scale of what you are doing here, I've typically found that auditwheel does the right thing.
You're going to want to unzip the resulting wheel and use readelf -d
to check DT_NEEDED and RUNPATH entries of anything depending on libgflags to spot what it is doing.
Thanks, appreciate your jumping in so I don't have to wonder who to engage with.
That's a good point. The repo management scripts looked rather complicated but I guess I can use this as an excuse to brush up on my I'd like to do "the right thing"... Advice? Thoughts? |
Scripts like https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/pytorch_torch_repo.py are indeed rather complicated but you only need a friction. torch is special in a way that we carried local patches (all gone now) + that some of the file need to be hipified. I think neither is needed for UCCL.
It can actually be a rather small script that just clones the repo cut out from the current build script. To add some other nice features, additional arguments like
should be good to go and everything that is needed from |
Yes, I doubt that pretty much any other project will have the complexity of what we have to deal with for pytorch. Best to not use that for inspiration. |
Ultra & Unified CCL is a efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., IBGDA), with two key focuses:
For collectives, UCCL-collective serves as a drop-in replacement for NCCL/RCCL (e.g., requiring no changes to application code), and significantly outperforms them in both latency and throughput across various settings.
Motivation
Add UCCL as an external project to TheRock.
UCCL provides performant and easy to use network congestion control plugin for RCCL collectives that with significant benefits for training and inference at scale. It also contains a performant transfer engine that is actively being developed by academic partners.
Lowering the barrier to clients' use of these has obvious benefits for AMD.
Technical Details
This PR is small and localized -- it only adds a python build script for the UCCL external project and a README.md file.
The majority of the work has already been pushed and merged into the upstream UCCL project and consisted of modifying its build scripts to accommodate a new
therock
build target parameterized in a way that makes inclusion here (as an external project) easy.Test Plan
The build script and all its options have been tested with the (already merged) upstream functionality.
Test Result
Tests of cloning and building succeed and produce manylinux wheels of UCCL dependent on TheRock wheels.
Submission Checklist