CUDA backend: compile #2276

zcbenz · 2025-06-12T02:31:11Z

This PR is split from #1983.

Adds a JitModule class that wraps NVRTC APIs, and implements the Compiled primitive.

The NVRTC API allows passing a list of headers for search in #include, so unlike the Metal backend that uses a script to preprocess the kernel files, the CUDA backend takes a platform-independent approach by embedding the files under backend/cuda/kernels/ into char arrays in a generated header, and then pass the list to NVRTC.

There are also a few implementation details caused limitations of NVRTC:

The cuLaunchKernel API which is used for launching compiled kernel, requires the arguments to be passed as an array of pointers, to avoid errors like passing pointers to temporary values and to simplify the usage, the passed args are copied to a list of variants managed by JitModule.
Getting a kernel function from compiled module requires querying with mangled C++ function name, while NVRTC provides APIs to get mangled name from C++ name, the API can only be used after compiling the program. So when storing the compiled module in disk cache, we have to also store the map of kernel names.

awni · 2025-06-12T16:11:04Z

mlx/backend/cuda/CMakeLists.txt

+  "${CMAKE_CURRENT_SOURCE_DIR}/kernels/*.h"
+  "${CMAKE_CURRENT_SOURCE_DIR}/kernels/*.cuh")


This is a bit of a nit, but the directory layout has a confusing naming scheme. It seems like kernels/ is not actually where the "kernels" go.. but where some of the utilities for the kernels go.

Can we reconsider the name of this subdirectory? Or alternatively just flatten the directory structure all into cuda and explicitly list the sources we want to include here?

The files under kernels/ must be compatible to JIT compilation, having more limitations than normal CUDA code, the biggest difference is that they can not contain any host-only code. I used "kernels" as name to keep consistency with Metal backend, a more precise name could be "device"?

Yea I like device more. That's a great suggestion!

zcbenz · 2025-06-13T00:00:54Z

I added a commit to rename kernels/ to device/, it is going to cause lots of conflicts with other PRs so you might want to merged that one later.

awni · 2025-06-13T00:08:14Z

Thanks! I'll manage the conflicts.. should be fairly straightforward.

CUDA backend: compile

b2dd60c

zcbenz mentioned this pull request Jun 12, 2025

CUDA backend: indexing ops #2277

Merged

awni reviewed Jun 12, 2025

View reviewed changes

Rename kernels/ to device/

ef9495f

awni approved these changes Jun 13, 2025

View reviewed changes

awni merged commit a4fc671 into ml-explore:main Jun 13, 2025
1 check was pending

zcbenz deleted the cuda-compile branch July 6, 2025 11:24

BrewTestBot mentioned this pull request Jul 25, 2025

mlx 0.27.1 Homebrew/homebrew-core#231260

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA backend: compile #2276

CUDA backend: compile #2276

Uh oh!

zcbenz commented Jun 12, 2025

Uh oh!

awni Jun 12, 2025

Uh oh!

zcbenz Jun 12, 2025

Uh oh!

awni Jun 12, 2025

Uh oh!

zcbenz commented Jun 13, 2025

Uh oh!

awni commented Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!

		"${CMAKE_CURRENT_SOURCE_DIR}/kernels/*.h"
		"${CMAKE_CURRENT_SOURCE_DIR}/kernels/*.cuh")

CUDA backend: compile #2276

CUDA backend: compile #2276

Uh oh!

Conversation

zcbenz commented Jun 12, 2025

Uh oh!

awni Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

zcbenz Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

awni Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

zcbenz commented Jun 13, 2025

Uh oh!

awni commented Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!