ci: add unittest for different cuda version #1560

yzh119 · 2025-08-24T06:17:00Z

📌 Description

As reported in #1557 (comment), our UT do not cover the cuda 12.6 environment, this PR fixes the issue by adding cu126/cu128/cu129 UT to CI.

🔍 Related Issues

#1557 (comment)

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

cc @yongwww @zhyncs

gemini-code-assist

Summary of Changes

Hello @yzh119, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's continuous integration capabilities by extending unit test coverage to include CUDA versions 12.6, 12.8, and 12.9. This change addresses a reported gap in testing for newer CUDA environments, ensuring that the project's codebase remains robust and compatible across a wider range of NVIDIA GPU setups. The modifications involve updating the Jenkins pipeline to orchestrate these new test runs and adjusting test scripts to accommodate version-specific execution.

Highlights

Expanded CUDA Version Testing: The CI pipeline now includes dedicated test runs for CUDA versions 12.6, 12.8, and 12.9, ensuring broader compatibility and stability across different NVIDIA environments.
CI Configuration and Test Function Updates: The Jenkinsfile was updated to define specific Docker images for each new CUDA version, and the core unit test functions (for CPU AOT compile and GPU JIT tests) were refactored to accept a CUDA version parameter, making the test execution more flexible and version-aware.
Simplified Import Test: The AOT build import test script was simplified to a basic 'import flashinfer' check, streamlining the verification process for successful library installation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request successfully adds CI tests for multiple CUDA versions (12.6, 12.8, 12.9) as intended. The changes primarily involve modifying the Jenkinsfile to parameterize test execution with the CUDA version and adding parallel stages for each version.

My review focuses on improving the maintainability of the Jenkinsfile by reducing significant code duplication. I've provided suggestions to refactor repeated logic into helper functions and use data structures like maps to make the pipeline script more concise and easier to extend. Additionally, I've pointed out a potential regression in a test script where a check was simplified, possibly weakening the test's effectiveness.

gemini-code-assist · 2025-08-24T06:18:52Z

Jenkinsfile

+      try {
+        run_unittest_CPU_AOT_COMPILE('CPU-LARGE-SPOT', 'cu126')
+      } catch (Throwable ex) {
+        echo 'Exception during SPOT run ' + ex.toString()
+        if (is_last_build()) {
+          echo 'Exception during SPOT run ' + ex.toString() + ' retry on-demand'
+          currentBuild.result = 'SUCCESS'
+          run_unittest_CPU_AOT_COMPILE('CPU-LARGE', 'cu126')
+        } else {
+          echo 'Exit since it is not last build'
+          throw ex
+        }
+      }


This try-catch block for handling SPOT instance failures is repeated for every test job in the parallel stage (18 times in total). This leads to a lot of duplicated code, making the Jenkinsfile hard to read and maintain.

You can refactor this logic into a helper function. This will significantly reduce code duplication and improve readability.

Here's an example of what the helper function could look like:

def runWithSpotRetry(spotClosure, onDemandClosure) { try { spotClosure() } catch (Throwable ex) { echo 'Exception during SPOT run ' + ex.toString() if (is_last_build()) { echo 'Exception during SPOT run ' + ex.toString() + ' retry on-demand' currentBuild.result = 'SUCCESS' onDemandClosure() } else { echo 'Exit since it is not last build' throw ex } } }

And you would use it like this:

'AOT-Build-Import-x86-64-cu126': { runWithSpotRetry( { run_unittest_CPU_AOT_COMPILE('CPU-LARGE-SPOT', 'cu126') }, { run_unittest_CPU_AOT_COMPILE('CPU-LARGE', 'cu126') } ) },

Applying this pattern would make the pipeline script much more concise and maintainable. You could even take it a step further and generate the parallel stages dynamically in a loop.

gemini-code-assist · 2025-08-24T06:18:52Z

Jenkinsfile

+docker_run_cu126 = "bash ci/bash.sh flashinfer/flashinfer-ci-cu126:latest"
+docker_run_cu128 = "bash ci/bash.sh flashinfer/flashinfer-ci-cu128:latest"
+docker_run_cu129 = "bash ci/bash.sh flashinfer/flashinfer-ci-cu129:latest"


To improve maintainability and make it easier to add or remove CUDA versions in the future, consider defining these Docker run commands in a Groovy map. This centralizes the configuration and works well with the suggestion to simplify the command selection logic in the test functions.

docker_runs = ["cu126": "bash ci/bash.sh flashinfer/flashinfer-ci-cu126:latest", "cu128": "bash ci/bash.sh flashinfer/flashinfer-ci-cu128:latest", "cu129": "bash ci/bash.sh flashinfer/flashinfer-ci-cu129:latest"]

gemini-code-assist · 2025-08-24T06:18:52Z

Jenkinsfile

+  def docker_run = ""
+  if (cuda_version == "cu126") {
+    docker_run = docker_run_cu126
+  } else if (cuda_version == "cu128") {
+    docker_run = docker_run_cu128
+  } else if (cuda_version == "cu129") {
+    docker_run = docker_run_cu129
+  } else {
+    error("Unknown CUDA version: ${cuda_version}")
+  }


This if/else if block for selecting the docker command is duplicated in shard_run_unittest_GPU (lines 170-179). By using a map for the docker commands (as suggested in another comment), you can simplify this logic to a single lookup. This removes code duplication and makes the script cleaner and easier to maintain.

def docker_run = docker_runs[cuda_version] if (docker_run == null) { error("Unknown CUDA version: ${cuda_version}") }

gemini-code-assist · 2025-08-24T06:18:53Z

scripts/task_test_aot_build_import.sh

@@ -14,4 +14,4 @@ pip install dist/*.whl
 # test import
 mkdir -p tmp
 cd tmp
-python -c "from flashinfer.page import gen_page_module; p = gen_page_module().aot_path; print(p); assert p.exists();"
+python -c "import flashinfer"


The AOT import test has been simplified to just import flashinfer. The previous version of this test was more specific and checked for the existence of AOT-compiled artifacts (gen_page_module().aot_path.exists()). The current check is less thorough and might not catch issues where AOT compilation fails silently for some modules. Consider restoring a more specific check to ensure the AOT artifacts are correctly generated and accessible.

Suggested change

python -c "import flashinfer"

python -c "from flashinfer.page import gen_page_module; p = gen_page_module().aot_path; print(p); assert p.exists()"

zhyncs · 2025-08-24T08:54:14Z

cool

yongwww

Thanks so much for all the great effort!

zhyncs · 2025-08-25T00:48:54Z

@yzh119 is it ready and when will we release a new version

yzh119 · 2025-08-25T01:09:53Z

still one error left

yzh119 · 2025-08-25T01:38:18Z

Should be fixed in d41a6c1, will release the post fix as soon as the CI got passed

## 📌 Description v0.2.14 cannot be used by sglang because of cu126 compilation issues, the release fixes the issue. ## 🔍 Related Issues #1560 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes

add UT for different cuda versions

7e1c399

gemini-code-assist bot reviewed Aug 24, 2025

View reviewed changes

yzh119 added 6 commits August 24, 2025 02:27

guard cu126 for cuda_fp4.h

298c8f5

do not generate code for sm120 for cuda 12.6

3fe4403

do not generate code for sm100 for cuda 12.6

306e0ce

another guard

ae36e6b

another guard

5004674

another guard

a7963d9

another guard

9f9ecd2

yzh119 enabled auto-merge (squash) August 24, 2025 09:27

yzh119 added 2 commits August 24, 2025 05:33

move fp4quantize to sm100 specific

63e9fb2

upd

25bd0ed

yongwww approved these changes Aug 24, 2025

View reviewed changes

yzh119 mentioned this pull request Aug 24, 2025

fix: separate out fp4 lib into sm90 and sm100 versions, add oob checking in fused moe #1565

Merged

5 tasks

yzh119 added 9 commits August 24, 2025 16:17

upd

15b094e

Merge remote-tracking branch 'origin/main' into aot-ut-cu126/128/129

8d1d9c5

upd

cce687d

upd

1771240

yet another guard

4739045

yet another guard

e7fc220

yet another guard

9a0a098

upd

a888365

upd

908a503

yzh119 added 2 commits August 25, 2025 01:34

bunch of fix

d41a6c1

fix

a018ff3

yzh119 merged commit 811a27b into flashinfer-ai:main Aug 25, 2025
2 checks passed

yzh119 mentioned this pull request Aug 25, 2025

release: bump version to v0.2.14.post1 #1568

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: add unittest for different cuda version #1560

ci: add unittest for different cuda version #1560

Uh oh!

yzh119 commented Aug 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 24, 2025

Uh oh!

gemini-code-assist bot Aug 24, 2025

Uh oh!

gemini-code-assist bot Aug 24, 2025

Uh oh!

gemini-code-assist bot Aug 24, 2025

Uh oh!

zhyncs commented Aug 24, 2025

Uh oh!

yongwww left a comment

Uh oh!

zhyncs commented Aug 25, 2025

Uh oh!

yzh119 commented Aug 25, 2025

Uh oh!

yzh119 commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

	python -c "import flashinfer"
	python -c "from flashinfer.page import gen_page_module; p = gen_page_module().aot_path; print(p); assert p.exists()"

ci: add unittest for different cuda version #1560

ci: add unittest for different cuda version #1560

Uh oh!

Conversation

yzh119 commented Aug 24, 2025

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

zhyncs commented Aug 24, 2025

Uh oh!

yongwww left a comment

Choose a reason for hiding this comment

Uh oh!

zhyncs commented Aug 25, 2025

Uh oh!

yzh119 commented Aug 25, 2025

Uh oh!

yzh119 commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!