-
Notifications
You must be signed in to change notification settings - Fork 248
Fixup to work with llama-stack #1588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR modifies the ramalama stack to be compatible with llama-stack by switching the network binding to 0.0.0.0, refactoring the llama-server command and arguments into an inline YAML list with stringified parameters, standardizing the llama stack run invocation to exec-form in both the Kubernetes spec and Containerfile, and adding attribute checks to the chat kill routine for safer termination. Sequence diagram for safer process/container termination in Chat.kills()sequenceDiagram
participant Chat
participant OS
participant ContainerManager
Chat->>Chat: kills()
alt args has pid2kill
Chat->>OS: os.kill(pid2kill, SIGINT)
Chat->>OS: os.kill(pid2kill, SIGTERM)
Chat->>OS: os.kill(pid2kill, SIGKILL)
else args has name
Chat->>ContainerManager: stop_container(args, name)
end
Class diagram for updated Stack and Chat classesclassDiagram
class Stack {
- args
- name
- host
- model
- stack_image
+ __init__(args)
+ generate()
}
class Chat {
- args
+ _req()
+ kills()
+ loop()
}
Stack <|-- Chat
%% Highlighted changes:
%% - Stack.host now defaults to "0.0.0.0" instead of "127.0.0.1"
%% - Stack.generate() refactored llama-server command/args handling
%% - Chat.kills() now checks for attribute existence before accessing args.pid2kill and args.name
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @rhatdan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on enhancing the compatibility and operational robustness of the ramalama
application within a llama-stack
containerized deployment. The changes primarily involve adopting container best practices for command execution, improving argument handling for increased stability, and configuring network settings to facilitate seamless inter-container communication, thereby ensuring the application functions correctly in its intended environment.
Highlights
- Container Command Execution: Updated
CMD
instructions incontainer-images/llama-stack/Containerfile
andargs
definitions inramalama/stack.py
for thellama-stack
container to use the exec form (JSON array) instead of the shell form. This is a best practice for container images, improving clarity and avoiding shell-related issues. - Robust Argument Handling: Added
hasattr
checks inramalama/chat.py
before accessingself.args.pid2kill
andself.args.name
. This preventsAttributeError
exceptions if these optional arguments are not provided, making the application more robust. - Network Accessibility for Model Server: Changed the default host binding for the
llama-server
from127.0.0.1
(localhost) to0.0.0.0
(all network interfaces) inramalama/stack.py
. This ensures the model server is accessible from other containers or external hosts within thellama-stack
environment. - Inter-Container Communication: Refactored the generation of
llama-server
arguments inramalama/stack.py
to be directly embedded as a YAML list string, and updated theRAMALAMA_URL
environment variable to use the dynamicself.host
(now0.0.0.0
) instead of hardcoding127.0.0.1
. This ensures correct communication between thellama-stack
andmodel-server
containers.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adapts the codebase for compatibility with llama-stack
. The changes include updating the server host binding for external accessibility, converting argument types for CLI consistency, and refactoring container command definitions to the exec form, which are all positive improvements.
I've found one issue in ramalama/stack.py
where the client connection URL is being constructed with 0.0.0.0
, which is incorrect and will likely cause connection failures between containers in the pod. I've provided a suggestion to fix this.
The other changes, such as adding hasattr
checks in ramalama/chat.py
and updating the Containerfile
CMD
, improve the robustness and follow best practices.
d94b61f
to
2df4f6c
Compare
Adapt ramalama stack and chat modules for compatibility with llama-stack by updating host binding, argument formatting, and command invocation patterns, and add robust attribute checks in the chat utility. Bug Fixes: Add hasattr checks around optional args (pid2kill, name) in chat kills() to prevent attribute errors Enhancements: Bind model server to 0.0.0.0 instead of localhost for external accessibility Convert port, context size, and thread count arguments to strings for consistent CLI usage Reformat container YAML to use JSON array and multiline args for llama-server and llama-stack commands Update Containerfile CMD to JSON exec form for llama-stack entrypoint Signed-off-by: Daniel J Walsh <[email protected]>
Summary by Sourcery
Adapt ramalama stack and chat modules for compatibility with llama-stack by updating host binding, argument formatting, and command invocation patterns, and add robust attribute checks in the chat utility.
Bug Fixes:
Enhancements: