Skip to content

Conversation

yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Aug 24, 2025

📌 Description

As reported in #1557 (comment), our UT do not cover the cuda 12.6 environment, this PR fixes the issue by adding cu126/cu128/cu129 UT to CI.

🔍 Related Issues

#1557 (comment)

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

cc @yongwww @zhyncs

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yzh119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's continuous integration capabilities by extending unit test coverage to include CUDA versions 12.6, 12.8, and 12.9. This change addresses a reported gap in testing for newer CUDA environments, ensuring that the project's codebase remains robust and compatible across a wider range of NVIDIA GPU setups. The modifications involve updating the Jenkins pipeline to orchestrate these new test runs and adjusting test scripts to accommodate version-specific execution.

Highlights

  • Expanded CUDA Version Testing: The CI pipeline now includes dedicated test runs for CUDA versions 12.6, 12.8, and 12.9, ensuring broader compatibility and stability across different NVIDIA environments.
  • CI Configuration and Test Function Updates: The Jenkinsfile was updated to define specific Docker images for each new CUDA version, and the core unit test functions (for CPU AOT compile and GPU JIT tests) were refactored to accept a CUDA version parameter, making the test execution more flexible and version-aware.
  • Simplified Import Test: The AOT build import test script was simplified to a basic 'import flashinfer' check, streamlining the verification process for successful library installation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully adds CI tests for multiple CUDA versions (12.6, 12.8, 12.9) as intended. The changes primarily involve modifying the Jenkinsfile to parameterize test execution with the CUDA version and adding parallel stages for each version.

My review focuses on improving the maintainability of the Jenkinsfile by reducing significant code duplication. I've provided suggestions to refactor repeated logic into helper functions and use data structures like maps to make the pipeline script more concise and easier to extend. Additionally, I've pointed out a potential regression in a test script where a check was simplified, possibly weakening the test's effectiveness.

Jenkinsfile Outdated
Comment on lines 229 to 241
try {
run_unittest_CPU_AOT_COMPILE('CPU-LARGE-SPOT', 'cu126')
} catch (Throwable ex) {
echo 'Exception during SPOT run ' + ex.toString()
if (is_last_build()) {
echo 'Exception during SPOT run ' + ex.toString() + ' retry on-demand'
currentBuild.result = 'SUCCESS'
run_unittest_CPU_AOT_COMPILE('CPU-LARGE', 'cu126')
} else {
echo 'Exit since it is not last build'
throw ex
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This try-catch block for handling SPOT instance failures is repeated for every test job in the parallel stage (18 times in total). This leads to a lot of duplicated code, making the Jenkinsfile hard to read and maintain.

You can refactor this logic into a helper function. This will significantly reduce code duplication and improve readability.

Here's an example of what the helper function could look like:

def runWithSpotRetry(spotClosure, onDemandClosure) {
  try {
    spotClosure()
  } catch (Throwable ex) {
    echo 'Exception during SPOT run ' + ex.toString()
    if (is_last_build()) {
      echo 'Exception during SPOT run ' + ex.toString() + ' retry on-demand'
      currentBuild.result = 'SUCCESS'
      onDemandClosure()
    } else {
      echo 'Exit since it is not last build'
      throw ex
    }
  }
}

And you would use it like this:

'AOT-Build-Import-x86-64-cu126': {
  runWithSpotRetry(
    { run_unittest_CPU_AOT_COMPILE('CPU-LARGE-SPOT', 'cu126') },
    { run_unittest_CPU_AOT_COMPILE('CPU-LARGE', 'cu126') }
  )
},

Applying this pattern would make the pipeline script much more concise and maintainable. You could even take it a step further and generate the parallel stages dynamically in a loop.

Comment on lines +43 to +45
docker_run_cu126 = "bash ci/bash.sh flashinfer/flashinfer-ci-cu126:latest"
docker_run_cu128 = "bash ci/bash.sh flashinfer/flashinfer-ci-cu128:latest"
docker_run_cu129 = "bash ci/bash.sh flashinfer/flashinfer-ci-cu129:latest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and make it easier to add or remove CUDA versions in the future, consider defining these Docker run commands in a Groovy map. This centralizes the configuration and works well with the suggestion to simplify the command selection logic in the test functions.

docker_runs = ["cu126": "bash ci/bash.sh flashinfer/flashinfer-ci-cu126:latest",
               "cu128": "bash ci/bash.sh flashinfer/flashinfer-ci-cu128:latest",
               "cu129": "bash ci/bash.sh flashinfer/flashinfer-ci-cu129:latest"]

Comment on lines +114 to +123
def docker_run = ""
if (cuda_version == "cu126") {
docker_run = docker_run_cu126
} else if (cuda_version == "cu128") {
docker_run = docker_run_cu128
} else if (cuda_version == "cu129") {
docker_run = docker_run_cu129
} else {
error("Unknown CUDA version: ${cuda_version}")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if/else if block for selecting the docker command is duplicated in shard_run_unittest_GPU (lines 170-179). By using a map for the docker commands (as suggested in another comment), you can simplify this logic to a single lookup. This removes code duplication and makes the script cleaner and easier to maintain.

  def docker_run = docker_runs[cuda_version]
  if (docker_run == null) {
    error("Unknown CUDA version: ${cuda_version}")
  }

@@ -14,4 +14,4 @@ pip install dist/*.whl
# test import
mkdir -p tmp
cd tmp
python -c "from flashinfer.page import gen_page_module; p = gen_page_module().aot_path; print(p); assert p.exists();"
python -c "import flashinfer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The AOT import test has been simplified to just import flashinfer. The previous version of this test was more specific and checked for the existence of AOT-compiled artifacts (gen_page_module().aot_path.exists()). The current check is less thorough and might not catch issues where AOT compilation fails silently for some modules. Consider restoring a more specific check to ensure the AOT artifacts are correctly generated and accessible.

Suggested change
python -c "import flashinfer"
python -c "from flashinfer.page import gen_page_module; p = gen_page_module().aot_path; print(p); assert p.exists()"

@zhyncs
Copy link
Member

zhyncs commented Aug 24, 2025

cool

@yzh119 yzh119 enabled auto-merge (squash) August 24, 2025 09:27
Copy link
Collaborator

@yongwww yongwww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for all the great effort!

@zhyncs
Copy link
Member

zhyncs commented Aug 25, 2025

@yzh119 is it ready and when will we release a new version

@yzh119
Copy link
Collaborator Author

yzh119 commented Aug 25, 2025

still one error left

@yzh119
Copy link
Collaborator Author

yzh119 commented Aug 25, 2025

Should be fixed in d41a6c1, will release the post fix as soon as the CI got passed

@yzh119 yzh119 merged commit 811a27b into flashinfer-ai:main Aug 25, 2025
2 checks passed
yzh119 added a commit that referenced this pull request Aug 25, 2025
<!-- .github/pull_request_template.md -->

## 📌 Description

v0.2.14 cannot be used by sglang because of cu126 compilation issues,
the release fixes the issue.

## 🔍 Related Issues

#1560

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants