Skip to content

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Jun 21, 2025

Summary by Sourcery

Eliminate the external libexec wrapper (ramalama-serve-core) by inlining its logic into the code and cleaning up related packaging, docs, and CLI options.

Enhancements:

  • Extract serve argument construction into new vllm_serve and llama_serve methods in model.py
  • Refactor build_exec_args_serve to delegate to the new helper methods instead of inlining logic
  • Update Kubernetes stack generator to invoke llama-server directly with separate command and args lists
  • Suppress the deprecated --max-model-len CLI flag in favor of the unified --ctx-size option

Build:

  • Strip out libexec wrapper files from the project’s packaging configuration

Documentation:

  • Update the man page and YAML examples to reference direct llama-server invocation instead of the wrapper

Chores:

  • Remove the get_ramalama_core_path and get_cmd_with_wrapper functions
  • Delete the libexec/ramalama/ramalama-serve-core binary and its packaging entries in pyproject.toml

Copy link
Contributor

sourcery-ai bot commented Jun 21, 2025

Reviewer's Guide

This PR removes the legacy libexec wrapper (ramalama-serve-core), refactors the argument-building logic into two dedicated methods (vllm_serve and llama_serve), and updates all call sites, CLI options, stack definitions, documentation, and packaging entries to use the new direct command invocations.

Class diagram for refactored argument-building logic in Model

classDiagram
    class Model {
        +validate_args(args)
        +build_exec_args_bench(args, model_path)
        +build_exec_args_serve(args, exec_model_path, chat_template_path="", mmproj_path="")
        +vllm_serve(args, exec_model_path)
        +llama_serve(args, exec_model_path, chat_template_path, mmproj_path)
    }
    Model : -get_ramalama_core_path(args, exec_cmd)  // removed
    Model : -get_cmd_with_wrapper(cmd_arg)  // removed (was in common.py)
    Model : +vllm_serve(args, exec_model_path)  // new
    Model : +llama_serve(args, exec_model_path, chat_template_path, mmproj_path)  // new
    Model : build_exec_args_serve() calls vllm_serve() or llama_serve()
Loading

File-Level Changes

Change Details Files
Eliminate the libexec wrapper and its usage
  • Deleted get_ramalama_core_path and its calls
  • Removed get_cmd_with_wrapper in common.py
  • Removed libexec/ramalama/ramalama-serve-core from packaging and disk
  • Deleted the actual libexec script file
ramalama/model.py
ramalama/common.py
pyproject.toml
libexec/ramalama/ramalama-serve-core
Extract argument builders into vllm_serve and llama_serve
  • Created vllm_serve to encapsulate vllm runtime args
  • Created llama_serve to build llama-server args, including draft model, RPC, debug, web UI flags, and Flash attention
  • Replaced inline serve-args assembly in build_exec_args_serve with calls to the two helpers
ramalama/model.py
Adjust CLI context option aliasing
  • Moved --max-model-len under --ctx-size as a suppressed alias
  • Preserved single context destination for backward compatibility
ramalama/cli.py
Update stack definition for direct command invocation
  • Changed llama_cmd from list to string
  • Introduced separate llama_args list
  • Updated container command and args fields to use llama_cmd and llama_args
ramalama/stack.py
Revise documentation examples to match new invocation
  • Removed stray blank line in serve docs
  • Updated example YAML to use "llama-server" as command and the new args format
docs/ramalama-serve.1.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @rhatdan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on streamlining the application's execution model by removing the reliance on a dedicated libexec program for the llama-server component. This involves significant refactoring of how server commands are constructed and invoked, along with corresponding updates to documentation, build configurations, and deployment examples, simplifying the overall architecture.

Highlights

  • Elimination of libexec dependency: I've removed the explicit handling and installation of the ramalama-serve-core executable in the libexec directory. The system now expects llama-server to be directly available in the execution environment (e.g., PATH or container entrypoint).
  • Refactoring of server argument construction: The logic for building command-line arguments for both llama-server and vllm server has been extracted into dedicated, more modular methods (llama_serve and vllm_serve) within ramalama/model.py.
  • Cleanup of argument parsing and utility functions: The --max-model-len argument is no longer an alias for --ctx-size but is now a suppressed argument. Additionally, the get_cmd_with_wrapper utility function and related sysconfig import, which were used to locate the libexec program, have been removed.
  • Documentation and deployment manifest updates: I've updated the ramalama-serve.1.md documentation and the generated Kubernetes/OpenShift deployment examples in ramalama/stack.py to reflect the new command invocation structure, removing all references to the libexec path.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@rhatdan
Copy link
Member Author

rhatdan commented Jun 21, 2025

Replace: #1504

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rhatdan - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `ramalama/model.py:533` </location>
<code_context>
+        if args.debug:
+            exec_args += ["-v"]
+
+        if hasattr(args, "webui") and args.webui == "off":
+            exec_args.extend(["--no-webui"])
+
</code_context>

<issue_to_address>
Use of hasattr for args.webui may mask argument parsing issues.

Consider ensuring 'webui' is always set during argument parsing to avoid masking missing arguments.

Suggested implementation:

```python
        if args.webui == "off":
            exec_args.extend(["--no-webui"])

```

You must ensure that the argument parser (likely in your CLI or main entrypoint) always sets `args.webui`, for example by adding:
```python
parser.add_argument("--webui", choices=["on", "off"], default="on", help="Enable or disable the web UI")
```
This will guarantee that `args.webui` is always present and set to either `"on"` or `"off"`, so the direct check in this file is safe.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

if args.debug:
exec_args += ["-v"]

if hasattr(args, "webui") and args.webui == "off":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Use of hasattr for args.webui may mask argument parsing issues.

Consider ensuring 'webui' is always set during argument parsing to avoid masking missing arguments.

Suggested implementation:

        if args.webui == "off":
            exec_args.extend(["--no-webui"])

You must ensure that the argument parser (likely in your CLI or main entrypoint) always sets args.webui, for example by adding:

parser.add_argument("--webui", choices=["on", "off"], default="on", help="Enable or disable the web UI")

This will guarantee that args.webui is always present and set to either "on" or "off", so the direct check in this file is safe.

Comment on lines 481 to 488
exec_args = [
"--model",
exec_model_path,
"--port",
args.port,
"--max-sequence-length",
f"{args.context}",
] + args.runtime_args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable)

return exec_args

def llama_serve(self, args, exec_model_path, chat_template_path, mmproj_path):
exec_args = ["llama-server"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the ramalama-serve-core wrapper, inlines its logic, and cleans up related packaging, documentation, and CLI options. The changes look good overall, with improvements in code clarity and maintainability. I've provided some minor suggestions for documentation and code style.

Comment on lines 127 to 128
command: [{llama_cmd}]
args: {llama_args}\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a list for the command field is generally preferred for clarity and to avoid shell injection vulnerabilities. However, since llama_cmd is now a string, it should be enclosed in quotes to ensure it's treated as a single command.

Suggested change
command: [{llama_cmd}]
args: {llama_args}\
image: {self.args.image}
command: ["{llama_cmd}"]

@rhatdan rhatdan force-pushed the chat branch 2 times, most recently from 9bbb194 to 6ea3caf Compare June 22, 2025 15:56
rhatdan added 2 commits June 23, 2025 11:44
This fixes make validate to not complain about --ctx-size option.

No reason to have this available in display, since this is only for
users assuming vllm options.

Signed-off-by: Daniel J Walsh <[email protected]>
Signed-off-by: Daniel J Walsh <[email protected]>
@rhatdan rhatdan merged commit 1ee66c0 into containers:main Jun 23, 2025
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants