Skip to content

CUDA backend: compile #2276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2025
Merged

CUDA backend: compile #2276

merged 2 commits into from
Jun 13, 2025

Conversation

zcbenz
Copy link
Collaborator

@zcbenz zcbenz commented Jun 12, 2025

This PR is split from #1983.

Adds a JitModule class that wraps NVRTC APIs, and implements the Compiled primitive.

The NVRTC API allows passing a list of headers for search in #include, so unlike the Metal backend that uses a script to preprocess the kernel files, the CUDA backend takes a platform-independent approach by embedding the files under backend/cuda/kernels/ into char arrays in a generated header, and then pass the list to NVRTC.

There are also a few implementation details caused limitations of NVRTC:

  • The cuLaunchKernel API which is used for launching compiled kernel, requires the arguments to be passed as an array of pointers, to avoid errors like passing pointers to temporary values and to simplify the usage, the passed args are copied to a list of variants managed by JitModule.
  • Getting a kernel function from compiled module requires querying with mangled C++ function name, while NVRTC provides APIs to get mangled name from C++ name, the API can only be used after compiling the program. So when storing the compiled module in disk cache, we have to also store the map of kernel names.

Comment on lines +46 to +47
"${CMAKE_CURRENT_SOURCE_DIR}/kernels/*.h"
"${CMAKE_CURRENT_SOURCE_DIR}/kernels/*.cuh")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a nit, but the directory layout has a confusing naming scheme. It seems like kernels/ is not actually where the "kernels" go.. but where some of the utilities for the kernels go.

Can we reconsider the name of this subdirectory? Or alternatively just flatten the directory structure all into cuda and explicitly list the sources we want to include here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files under kernels/ must be compatible to JIT compilation, having more limitations than normal CUDA code, the biggest difference is that they can not contain any host-only code. I used "kernels" as name to keep consistency with Metal backend, a more precise name could be "device"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I like device more. That's a great suggestion!

@zcbenz
Copy link
Collaborator Author

zcbenz commented Jun 13, 2025

I added a commit to rename kernels/ to device/, it is going to cause lots of conflicts with other PRs so you might want to merged that one later.

@awni
Copy link
Member

awni commented Jun 13, 2025

Thanks! I'll manage the conflicts.. should be fairly straightforward.

@awni awni merged commit a4fc671 into ml-explore:main Jun 13, 2025
1 check was pending
@zcbenz zcbenz deleted the cuda-compile branch July 6, 2025 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants