-
Notifications
You must be signed in to change notification settings - Fork 489
ci: add unittest for different cuda version #1560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @yzh119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the project's continuous integration capabilities by extending unit test coverage to include CUDA versions 12.6, 12.8, and 12.9. This change addresses a reported gap in testing for newer CUDA environments, ensuring that the project's codebase remains robust and compatible across a wider range of NVIDIA GPU setups. The modifications involve updating the Jenkins pipeline to orchestrate these new test runs and adjusting test scripts to accommodate version-specific execution.
Highlights
- Expanded CUDA Version Testing: The CI pipeline now includes dedicated test runs for CUDA versions 12.6, 12.8, and 12.9, ensuring broader compatibility and stability across different NVIDIA environments.
- CI Configuration and Test Function Updates: The Jenkinsfile was updated to define specific Docker images for each new CUDA version, and the core unit test functions (for CPU AOT compile and GPU JIT tests) were refactored to accept a CUDA version parameter, making the test execution more flexible and version-aware.
- Simplified Import Test: The AOT build import test script was simplified to a basic 'import flashinfer' check, streamlining the verification process for successful library installation.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully adds CI tests for multiple CUDA versions (12.6, 12.8, 12.9) as intended. The changes primarily involve modifying the Jenkinsfile
to parameterize test execution with the CUDA version and adding parallel stages for each version.
My review focuses on improving the maintainability of the Jenkinsfile
by reducing significant code duplication. I've provided suggestions to refactor repeated logic into helper functions and use data structures like maps to make the pipeline script more concise and easier to extend. Additionally, I've pointed out a potential regression in a test script where a check was simplified, possibly weakening the test's effectiveness.
Jenkinsfile
Outdated
try { | ||
run_unittest_CPU_AOT_COMPILE('CPU-LARGE-SPOT', 'cu126') | ||
} catch (Throwable ex) { | ||
echo 'Exception during SPOT run ' + ex.toString() | ||
if (is_last_build()) { | ||
echo 'Exception during SPOT run ' + ex.toString() + ' retry on-demand' | ||
currentBuild.result = 'SUCCESS' | ||
run_unittest_CPU_AOT_COMPILE('CPU-LARGE', 'cu126') | ||
} else { | ||
echo 'Exit since it is not last build' | ||
throw ex | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This try-catch
block for handling SPOT instance failures is repeated for every test job in the parallel stage (18 times in total). This leads to a lot of duplicated code, making the Jenkinsfile hard to read and maintain.
You can refactor this logic into a helper function. This will significantly reduce code duplication and improve readability.
Here's an example of what the helper function could look like:
def runWithSpotRetry(spotClosure, onDemandClosure) {
try {
spotClosure()
} catch (Throwable ex) {
echo 'Exception during SPOT run ' + ex.toString()
if (is_last_build()) {
echo 'Exception during SPOT run ' + ex.toString() + ' retry on-demand'
currentBuild.result = 'SUCCESS'
onDemandClosure()
} else {
echo 'Exit since it is not last build'
throw ex
}
}
}
And you would use it like this:
'AOT-Build-Import-x86-64-cu126': {
runWithSpotRetry(
{ run_unittest_CPU_AOT_COMPILE('CPU-LARGE-SPOT', 'cu126') },
{ run_unittest_CPU_AOT_COMPILE('CPU-LARGE', 'cu126') }
)
},
Applying this pattern would make the pipeline script much more concise and maintainable. You could even take it a step further and generate the parallel stages dynamically in a loop.
docker_run_cu126 = "bash ci/bash.sh flashinfer/flashinfer-ci-cu126:latest" | ||
docker_run_cu128 = "bash ci/bash.sh flashinfer/flashinfer-ci-cu128:latest" | ||
docker_run_cu129 = "bash ci/bash.sh flashinfer/flashinfer-ci-cu129:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve maintainability and make it easier to add or remove CUDA versions in the future, consider defining these Docker run commands in a Groovy map. This centralizes the configuration and works well with the suggestion to simplify the command selection logic in the test functions.
docker_runs = ["cu126": "bash ci/bash.sh flashinfer/flashinfer-ci-cu126:latest",
"cu128": "bash ci/bash.sh flashinfer/flashinfer-ci-cu128:latest",
"cu129": "bash ci/bash.sh flashinfer/flashinfer-ci-cu129:latest"]
def docker_run = "" | ||
if (cuda_version == "cu126") { | ||
docker_run = docker_run_cu126 | ||
} else if (cuda_version == "cu128") { | ||
docker_run = docker_run_cu128 | ||
} else if (cuda_version == "cu129") { | ||
docker_run = docker_run_cu129 | ||
} else { | ||
error("Unknown CUDA version: ${cuda_version}") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if/else if
block for selecting the docker command is duplicated in shard_run_unittest_GPU
(lines 170-179). By using a map for the docker commands (as suggested in another comment), you can simplify this logic to a single lookup. This removes code duplication and makes the script cleaner and easier to maintain.
def docker_run = docker_runs[cuda_version]
if (docker_run == null) {
error("Unknown CUDA version: ${cuda_version}")
}
@@ -14,4 +14,4 @@ pip install dist/*.whl | |||
# test import | |||
mkdir -p tmp | |||
cd tmp | |||
python -c "from flashinfer.page import gen_page_module; p = gen_page_module().aot_path; print(p); assert p.exists();" | |||
python -c "import flashinfer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AOT import test has been simplified to just import flashinfer
. The previous version of this test was more specific and checked for the existence of AOT-compiled artifacts (gen_page_module().aot_path.exists()
). The current check is less thorough and might not catch issues where AOT compilation fails silently for some modules. Consider restoring a more specific check to ensure the AOT artifacts are correctly generated and accessible.
python -c "import flashinfer" | |
python -c "from flashinfer.page import gen_page_module; p = gen_page_module().aot_path; print(p); assert p.exists()" |
cool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for all the great effort!
@yzh119 is it ready and when will we release a new version |
still one error left |
Should be fixed in d41a6c1, will release the post fix as soon as the CI got passed |
<!-- .github/pull_request_template.md --> ## 📌 Description v0.2.14 cannot be used by sglang because of cu126 compilation issues, the release fixes the issue. ## 🔍 Related Issues #1560 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. -->
📌 Description
As reported in #1557 (comment), our UT do not cover the cuda 12.6 environment, this PR fixes the issue by adding cu126/cu128/cu129 UT to CI.
🔍 Related Issues
#1557 (comment)
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commit
by runningpip install pre-commit
(or used your preferred method).pre-commit install
.pre-commit run --all-files
and fixed any reported issues.🧪 Tests
unittest
, etc.).Reviewer Notes
cc @yongwww @zhyncs