-
Notifications
You must be signed in to change notification settings - Fork 235
Add vllm to cpu inferencing Containerfile #1677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideIntroduces a new layer in the Containerfile to build and install vllm on x86_64 hosts by adding a RUN command and a dedicated script that clones a pinned vllm commit, installs dependencies, and compiles for CPU. Flow diagram for build-vllm.sh script executionflowchart TD
A[Start build-vllm.sh] --> B{Is uname -m x86_64?}
B -- Yes --> C[Clone vllm repo]
C --> D[Reset to pinned commit]
D --> E[Install Python dependencies]
E --> F[Install vllm requirements]
F --> G[Build and install vllm for CPU]
B -- No --> H[Exit script]
G --> I[Script complete]
H --> I
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @ericcurtin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request integrates the vLLM
library into the ramalama
container image. It achieves this by adding a new build step to the Containerfile
that executes a newly created script responsible for cloning and installing vLLM
. The description notes that this integration is a temporary measure and will eventually be moved to a separate layer.
Highlights
- vLLM Integration: A new
RUN
command has been added to thecontainer-images/ramalama/Containerfile
to execute a script that installsvLLM
. - New Build Script for vLLM: A new shell script,
container-images/scripts/build-vllm.sh
, has been introduced. This script handles cloning thevllm-project/vllm
repository, checking out a specific commit (a5dd03c1ebc5e4f56f3c9d3dc0436e9c582c978f
), and installing it usinguv pip
with CPU-specific dependencies. It includes a check to ensure the installation only proceeds onx86_64
architectures.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ericcurtin - I've reviewed your changes - here's some feedback:
- Consider using git clone --depth 1 to speed up cloning and reduce image size.
- Add a cleanup step (e.g., rm -rf vllm) after installation to remove build artifacts and shrink the final image.
- Handle unsupported architectures explicitly by exiting with an error or logging a message instead of silently skipping the vllm build.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider using git clone --depth 1 to speed up cloning and reduce image size.
- Add a cleanup step (e.g., rm -rf vllm) after installation to remove build artifacts and shrink the final image.
- Handle unsupported architectures explicitly by exiting with an error or logging a message instead of silently skipping the vllm build.
## Individual Comments
### Comment 1
<location> `container-images/scripts/build-vllm.sh:10` </location>
<code_context>
+ uname_m="$(uname -m)"
+ if [ "$uname_m" = "x86_64" ]; then
+ local vllm_sha="a5dd03c1ebc5e4f56f3c9d3dc0436e9c582c978f"
+ git clone https://github.com/vllm-project/vllm
+ git reset --hard "$vllm_sha"
+ cd vllm
+ uv pip install wheel packaging ninja "setuptools>=49.4.0" numpy typing-extensions pillow setuptools-scm grpcio==1.68.1 protobuf bitsandbytes
</code_context>
<issue_to_address>
Cloning the repository and resetting without changing directory may not work as intended.
'git reset --hard' should be run inside the 'vllm' directory. Add 'cd vllm' before this command to ensure it targets the correct repository.
</issue_to_address>
### Comment 2
<location> `container-images/scripts/build-vllm.sh:15` </location>
<code_context>
+ cd vllm
+ uv pip install wheel packaging ninja "setuptools>=49.4.0" numpy typing-extensions pillow setuptools-scm grpcio==1.68.1 protobuf bitsandbytes
+ uv pip install -v -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
+ VLLM_TARGET_DEVICE=cpu python setup.py install
+ fi
+}
</code_context>
<issue_to_address>
Using 'python setup.py install' is deprecated in favor of 'pip install .'.
'python setup.py install' is deprecated and may cause inconsistent installations. Use 'pip install .' instead for improved compatibility and environment management.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
VLLM_TARGET_DEVICE=cpu python setup.py install
=======
VLLM_TARGET_DEVICE=cpu uv pip install .
>>>>>>> REPLACE
</suggested_fix>
### Comment 3
<location> `container-images/scripts/build-vllm.sh:19` </location>
<code_context>
+ fi
+}
+
+main "$@"
+
</code_context>
<issue_to_address>
Script only supports x86_64 architecture.
If this is intentional, please add a warning or exit message for unsupported architectures to improve clarity.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds VLLM support to the container image by introducing a new build script. The review identified a few issues in this script that may cause the build to fail. Suggestions have been provided to improve the build's efficiency and maintainability.
dcc4ce3
to
1597153
Compare
a698bb5
to
5158766
Compare
To be built upon "ramalama" image Signed-off-by: Eric Curtin <[email protected]>
@mikebonnet PTAL |
@ericcurtin built for all arches correct? |
I only tested x86_64 and aarch64. But there's a lot of active enablement work going on for ppc also, I got pinged to take a look at a ppc PR this morning in upstream vllm (the joys of open-source, you open one PR to enable aarch64 CPU inferencing and people then poke you to review things tangentially related). I also know there had been some s390 enablement work also. Might be worth attempting to build for all 4 arches. If ppc or s390 fail for any reason, maybe abandon them until they are more mature. I suspect vllm-cuda and vllm-rocm will try and duplicate all the cuda and rocm libraries with other specific versions to vllm. Apparently Intel works, they seem to call it xpu? No vulkan support which is a pity. |
If vulkan was added to vLLM... vLLM would almost have as much accelerated hardware support as llama.cpp, would only be missing native macOS, which could be worked around with @slp 's krunkit solution. Ignoring things like cann and musa, as that hardware isn't crazy common. |
To be built upon "ramalama" image