Skip to content

Adds the ability to include vision based context to chat via --rag #1661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 7, 2025

Conversation

ieaves
Copy link
Collaborator

@ieaves ieaves commented Jul 3, 2025

This PR address issue #150 allowing users to pass image based content through to an LLM over the chat interface.

Summary by Sourcery

Implement context-aware chat enhancements by adding vision-based RAG support, revamp environment/config parsing with nested and legacy variable handling, refactor file processing into dedicated loader managers, extend CLI with model listing and selection flags, and shore up build, CI, and test coverage around these changes.

New Features:

  • Enable vision-based context in chat via --rag for mixed text and image files
  • Add --list option to list available models and --model flag in chat CLI
  • Introduce load_env_config to parse nested and legacy RAMALAMA environment variables

Enhancements:

  • Refactor file handling into a new file_loaders module with TextFileManager and ImageFileManager
  • Add boolean coercion, nested dataclass support, and deep_merge in the layered config system
  • Replace FileUpLoader with OpanAIChatAPIMessageBuilder for constructing chat context messages
  • Centralize API key header injection via add_api_key and simplify default engine detection logic
  • Update Makefile and pyproject configuration with improved formatter, linter, and pre-commit settings

Build:

  • Extend GitHub Actions and Tekton pipeline timeouts and add wait-for-parent-image support
  • Add pre-commit hooks and configure ruff, codespell, black, and isort rules

CI:

  • Increase timeout for lint and test jobs
  • Enhance Tekton pipelines with a wait-for-parent-image step

Documentation:

  • Document new --list and --model CLI options in the man page and help output

Tests:

  • Add comprehensive unit tests for load_env_config, deep_merge, build_subconfigs, and LayeredMixin
  • Update and expand tests for the new file_loaders and OpanAIChatAPIMessageBuilder
  • Enhance integration tests for chat RAG context and image support

Copy link
Contributor

sourcery-ai bot commented Jul 3, 2025

Reviewer's Guide

This PR refactors the file ingestion and chat context pipeline by introducing a new file loader abstraction with text and image managers, integrates vision-based context via a RAG flag into the chat shell, enhances configuration layering and environment variable parsing with type coercion and nested keys, updates the CLI to support listing models and specifying a model, and aligns tests and CI/tooling (Makefile, pre-commit, ruff, Tekton pipelines) with these changes.

Sequence diagram for chat context ingestion with vision-based RAG

sequenceDiagram
    actor User
    participant CLI as CLI
    participant RamaLamaShell
    participant OpanAIChatAPIMessageBuilder
    participant TextFileManager
    participant ImageFileManager

    User->>CLI: ramalama chat --rag <context>
    CLI->>RamaLamaShell: Initialize shell with args
    RamaLamaShell->>OpanAIChatAPIMessageBuilder: builder.load(context)
    OpanAIChatAPIMessageBuilder->>TextFileManager: load(text files)
    OpanAIChatAPIMessageBuilder->>ImageFileManager: load(image files)
    OpanAIChatAPIMessageBuilder-->>RamaLamaShell: messages (text and image context)
    RamaLamaShell->>RamaLamaShell: Append messages to conversation_history
    RamaLamaShell->>CLI: Continue chat loop
Loading

Class diagram for new file loader abstraction (Text and Image Managers)

classDiagram
    class BaseFileLoader {
        <<abstract>>
        +static load(file: str) str
        +static file_extensions() set[str]
    }
    class TXTFileLoader {
        +static load(file: str) str
        +static file_extensions() set[str]
    }
    class BasicImageFileLoader {
        +static load(file: str) str
        +static file_extensions() set[str]
    }
    class PDFFileLoader {
        +static load(file: str) str
        +static file_extensions() set[str]
    }
    class BaseFileManager {
        <<abstract>>
        -loaders: dict
        +_get_loader(file: str) BaseFileLoader
        +abstract load()
        +abstract get_loaders() List[Type[BaseFileLoader]]
    }
    class TextFileManager {
        -document_delimiter: Template
        +load(files: list[str]) str
        +get_loaders() List[Type[BaseFileLoader]]
    }
    class ImageFileManager {
        +load(files: list[str]) list[str]
        +get_loaders() List[Type[BaseFileLoader]]
    }
    class OpanAIChatAPIMessageBuilder {
        -text_manager: TextFileManager
        -image_manager: ImageFileManager
        +partition_files(file_path: str) -> tuple
        +supported_extensions() set[str]
        +load(file_path: str) list[dict]
    }

    BaseFileLoader <|-- TXTFileLoader
    BaseFileLoader <|-- BasicImageFileLoader
    BaseFileLoader <|-- PDFFileLoader
    BaseFileManager <|-- TextFileManager
    BaseFileManager <|-- ImageFileManager
    OpanAIChatAPIMessageBuilder --> TextFileManager
    OpanAIChatAPIMessageBuilder --> ImageFileManager
Loading

Class diagram for updated configuration layering and type coercion

classDiagram
    class UserConfig {
        no_missing_gpu_prompt: bool = False
        __post_init__()
    }
    class RamalamaSettings {
        config_file: str | None = None
    }
    class BaseConfig {
        container: bool
        engine: str
        image: str
        store: str
        api: str
        transport: str
        ocr: bool
        default_image: str
        user: UserConfig
        settings: RamalamaSettings
        __post_init__()
    }
    class LayeredMixin {
        _fields: set
        _layers: list
        __init__(*layers)
        is_set(name: str) bool
    }
    LayeredMixin <|-- BaseConfig
    BaseConfig o-- UserConfig
    BaseConfig o-- RamalamaSettings
Loading

File-Level Changes

Change Details Files
Introduce a modular file loader framework for text, image, and PDF files
  • Add BaseFileLoader and file_types modules (txt, image, pdf)
  • Implement TextFileManager and ImageFileManager in file_manager.py
  • Remove legacy file_upload code and update related imports
  • Expand tests to validate loader behavior and supported extensions
ramalama/file_loaders/file_manager.py
ramalama/file_loaders/file_types/base.py
ramalama/file_loaders/file_types/txt.py
ramalama/file_loaders/file_types/image.py
ramalama/file_loaders/file_types/pdf.py
test/unit/test_file_upload.py
test/unit/test_file_upload_with_data.py
test/unit/test_file_upload_integration.py
Add RAG support in chat via OpanAIChatAPIMessageBuilder
  • Implement OpanAIChatAPIMessageBuilder to partition and load text/image files into messages
  • Refactor RamaLamaShell.prep_rag_message to use the new builder
  • Introduce warnings for unsupported files
  • Extend integration tests to cover image-based chat context
ramalama/chat.py
ramalama/file_loaders/file_manager.py
test/unit/test_file_upload_integration.py
test/system/040-serve.bats
Enhance configuration system with nested env loading, type coercion, and deep merging
  • Add coerce_to_bool, UserConfig, RamalamaSettings in config.py
  • Rewrite load_env_config for nested keys and type conversions
  • Improve load_file_config to include settings and deep_merge support
  • Extend layered_config with deep_merge, build_subconfigs, and refined LayeredMixin
  • Add comprehensive tests for env loading and layered configs
ramalama/config.py
ramalama/layered_config.py
test/unit/test_config.py
test/unit/test_layered_config.py
test/unit/conftest.py
Update CLI and parser to support listing models, specifying model, and API key handling
  • Add --list/--ls and --model flags in CLI parser
  • Implement add_api_key helper for authorization headers
  • Add abspath helper for --store argument
  • Update manual pages and system tests for new options
ramalama/cli.py
ramalama/chat.py
docs/ramalama-chat.1.md
test/system/040-serve.bats
Align tooling and CI with new project layout
  • Replace .flake8 with ruff and codespell configuration in pyproject.toml
  • Add pre-commit hooks for formatting and tests
  • Adjust Makefile to detect Python scripts and run lint/format commands
  • Extend GitHub Actions and Tekton pipeline YAML with additional timeouts and parent image tasks
pyproject.toml
.pre-commit-config.yaml
Makefile
.github/workflows/ci.yml
.github/workflows/install_ramalama.yml
.tekton/pipelines/pull-request-pipeline.yaml
.tekton/pipelines/push-pipeline.yaml
.tekton/tasks/wait-for-image.yaml
.tekton/cuda-llama-server/cuda-llama-server-pull-request.yaml
.tekton/cuda-rag/cuda-rag-pull-request.yaml
.tekton/cuda-whisper-server/cuda-whisper-server-pull-request.yaml
.tekton/ramalama-llama-server/ramalama-llama-server-pull-request.yaml
.tekton/ramalama-rag/ramalama-rag-pull-request.yaml
.tekton/ramalama-whisper-server/ramalama-whisper-server-pull-request.yaml
.tekton/rocm-llama-server/rocm-llama-server-pull-request.yaml
.tekton/rocm-rag/rocm-rag-pull-request.yaml
.tekton/rocm-ubi-llama-server/rocm-ubi-llama-server-pull-request.yaml
.tekton/rocm-ubi-rag/rocm-ubi-rag-pull-request.yaml
.tekton/rocm-ubi-whisper-server/rocm-ubi-whisper-server-pull-request.yaml
.tekton/rocm-whisper-server/rocm-whisper-server-pull-request.yaml
.tekton/cuda-llama-server/cuda-llama-server-push.yaml
.tekton/cuda-rag/cuda-rag-push.yaml
.tekton/cuda-whisper-server/cuda-whisper-server-push.yaml
.tekton/ramalama-llama-server/ramalama-llama-server-push.yaml
.tekton/ramalama-rag/ramalama-rag-push.yaml
.tekton/ramalama-whisper-server/ramalama-whisper-server-push.yaml
.tekton/rocm-llama-server/rocm-llama-server-push.yaml
.tekton/rocm-rag/rocm-rag-push.yaml
.tekton/rocm-ubi-llama-server/rocm-ubi-llama-server-push.yaml
.tekton/rocm-ubi-rag/rocm-ubi-rag-push.yaml
.tekton/rocm-ubi-whisper-server/rocm-ubi-whisper-server-push.yaml
.tekton/rocm-whisper-server/rocm-whisper-server-push.yaml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Signed-off-by: Ian Eaves <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ieaves, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new feature to the ramalama chat application: the ability to incorporate vision-based context (images) into chat conversations through Retrieval Augmented Generation (RAG). This involved a substantial refactoring of the underlying file loading and processing infrastructure to handle diverse document and image formats. Additionally, the configuration system has been upgraded for greater flexibility and nested settings, and the continuous integration/continuous delivery (CI/CD) pipelines have been expanded to support the new RAG-enabled container images.

Highlights

  • Vision-based RAG for Chat: Implemented the capability to augment chat conversations with context from image files (e.g., JPG, PNG) by converting them to base64 data URIs, alongside existing text-based RAG.
  • Overhauled File Loading System: Replaced the previous file_upload module with a more modular file_loaders system, enabling robust processing of various text (e.g., .txt, .md, .json, .yaml) and image file types for RAG.
  • Advanced Configuration Management: Introduced a new layered configuration system that supports nested environment variables (e.g., RAMALAMA_USER__SETTING) and recursive merging of configuration sources, enhancing flexibility and maintainability.
  • Expanded CI/CD for RAG Images: Added new Tekton pipelines for building and pushing RAG-specific container images (e.g., cuda-rag, rocm-rag) and updated existing pipelines to ensure parent images are available before dependent builds.
  • Improved Chat Command Line Interface: Enhanced the chat command with new options: --list to discover available models from an endpoint and --model to explicitly select a model for inferencing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ieaves - I've reviewed your changes - here's some feedback:

  • In load_file_config, config_path is never set before you use it to populate settings.config_file, so that value will always be None—capture the actual config file path when you parse and only add settings after a successful load.
  • The simplified get_default_engine drops the macOS apple_vm check and will always pick 'podman' if available; if you still need to respect Apple VM provisioning, reintroduce that conditional or clearly document the behavior change.
  • BasicImageFileLoader.load emits a MIME type of None when guess_type fails (resulting in data:None;base64,…); consider defaulting to a safe type like 'application/octet-stream' or raising an error instead.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In load_file_config, `config_path` is never set before you use it to populate `settings.config_file`, so that value will always be None—capture the actual config file path when you parse and only add `settings` after a successful load.
- The simplified get_default_engine drops the macOS apple_vm check and will always pick 'podman' if available; if you still need to respect Apple VM provisioning, reintroduce that conditional or clearly document the behavior change.
- BasicImageFileLoader.load emits a MIME type of None when guess_type fails (resulting in `data:None;base64,…`); consider defaulting to a safe type like 'application/octet-stream' or raising an error instead.

## Individual Comments

### Comment 1
<location> `ramalama/file_loaders/file_manager.py:20` </location>
<code_context>
+        self.loaders = {ext.lower(): loader() for loader in self.get_loaders() for ext in loader.file_extensions()}
+
+    def _get_loader(self, file: str) -> base.BaseFileLoader:
+        return self.loaders[os.path.splitext(file)[1].lower()]
+
+    @abstractmethod
</code_context>

<issue_to_address>
No error handling for unsupported file extensions in '_get_loader'.

Accessing 'self.loaders' without checking for the extension may cause a KeyError. Handle missing extensions by raising a custom exception or returning None for clearer errors.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
class BaseFileManager(ABC):
    """
    Base class for file upload handlers.
    This class should be extended by specific file type handlers.
    """

    def __init__(self):
        self.loaders = {ext.lower(): loader() for loader in self.get_loaders() for ext in loader.file_extensions()}

    def _get_loader(self, file: str) -> base.BaseFileLoader:
        return self.loaders[os.path.splitext(file)[1].lower()]

    @abstractmethod
    def load(self):
        pass

    @classmethod
    @abstractmethod
    def get_loaders(cls) -> List[Type[base.BaseFileLoader]]:
        pass
=======
class UnsupportedFileExtensionError(Exception):
    """Raised when a file extension is not supported by any loader."""
    pass

class BaseFileManager(ABC):
    """
    Base class for file upload handlers.
    This class should be extended by specific file type handlers.
    """

    def __init__(self):
        self.loaders = {ext.lower(): loader() for loader in self.get_loaders() for ext in loader.file_extensions()}

    def _get_loader(self, file: str) -> base.BaseFileLoader:
        ext = os.path.splitext(file)[1].lower()
        if ext not in self.loaders:
            raise UnsupportedFileExtensionError(f"Unsupported file extension: {ext}")
        return self.loaders[ext]

    @abstractmethod
    def load(self):
        pass

    @classmethod
    @abstractmethod
    def get_loaders(cls) -> List[Type[base.BaseFileLoader]]:
        pass
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `ramalama/file_loaders/file_types/txt.py:25` </location>
<code_context>
+        """
+
+        # TODO: Support for non-default encodings?
+        with open(file, "r") as f:
+            return f.read()
</code_context>

<issue_to_address>
No explicit encoding specified when opening text files.

Please specify 'encoding="utf-8"' when opening files to ensure consistent behavior across different systems.
</issue_to_address>

### Comment 3
<location> `test/unit/test_file_upload.py:141` </location>
<code_context>
+class TestImageFileManager:
+    """Test the image file manager class."""
+
+    def test_image_file_manager_load_single_image_file(self):
+        """Test loading a single image file."""
+        with patch.object(BasicImageFileLoader, 'load', return_value=""):
+            manager = ImageFileManager()
+            result = manager.load(["test.jpg"])
+
+        assert len(result) == 1
+        assert result[0] == ""
+
+    def test_image_file_manager_load_multiple_images(self):
</code_context>

<issue_to_address>
No test for unsupported file extensions in ImageFileManager.

Add a test to confirm that ImageFileManager.load properly handles unsupported file extensions, either by ignoring them or raising an error.

Suggested implementation:

```python
    def test_image_file_manager_load_multiple_images(self):
        """Test loading multiple image files."""
        with patch.object(
            BasicImageFileLoader, 'load', side_effect=["", ""]
        ):
            manager = ImageFileManager()
            result = manager.load(["test1.jpg", "test2.png"])

        assert len(result) == 2
        assert result[0] == ""
        assert result[1] == ""

    def test_image_file_manager_load_unsupported_extension(self):
        """Test that unsupported file extensions are handled properly."""
        # Patch the loader so it would raise an error if called with an unsupported extension
        with patch.object(BasicImageFileLoader, 'load') as mock_load:
            manager = ImageFileManager()
            # Assume 'txt' is not a supported image extension
            result = manager.load(["test.txt", "test.jpg"])
            # Only the supported file should be loaded
            assert len(result) == 1
            # The loader should only be called for the supported file
            mock_load.assert_called_once_with("test.jpg")

```

If `ImageFileManager.load` is supposed to raise an error for unsupported extensions instead of ignoring them, change the test to use `pytest.raises` and assert the exception is raised. Adjust the test according to the actual behavior of your implementation.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 141 to 148
def test_image_file_manager_load_single_image_file(self):
"""Test loading a single image file."""
with patch.object(BasicImageFileLoader, 'load', return_value=""):
manager = ImageFileManager()
result = manager.load(["test.jpg"])

assert len(result) == 1
assert result[0] == ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): No test for unsupported file extensions in ImageFileManager.

Add a test to confirm that ImageFileManager.load properly handles unsupported file extensions, either by ignoring them or raising an error.

Suggested implementation:

    def test_image_file_manager_load_multiple_images(self):
        """Test loading multiple image files."""
        with patch.object(
            BasicImageFileLoader, 'load', side_effect=["", ""]
        ):
            manager = ImageFileManager()
            result = manager.load(["test1.jpg", "test2.png"])

        assert len(result) == 2
        assert result[0] == ""
        assert result[1] == ""

    def test_image_file_manager_load_unsupported_extension(self):
        """Test that unsupported file extensions are handled properly."""
        # Patch the loader so it would raise an error if called with an unsupported extension
        with patch.object(BasicImageFileLoader, 'load') as mock_load:
            manager = ImageFileManager()
            # Assume 'txt' is not a supported image extension
            result = manager.load(["test.txt", "test.jpg"])
            # Only the supported file should be loaded
            assert len(result) == 1
            # The loader should only be called for the supported file
            mock_load.assert_called_once_with("test.jpg")

If ImageFileManager.load is supposed to raise an error for unsupported extensions instead of ignoring them, change the test to use pytest.raises and assert the exception is raised. Adjust the test according to the actual behavior of your implementation.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces vision-based RAG support, enhances environment/config parsing, refactors file processing, and extends the CLI with model listing and selection flags. A critical bug in the configuration loading logic needs to be addressed, along with a couple of medium-severity suggestions to improve code clarity. After these fixes, this will be a fantastic contribution.

Signed-off-by: Ian Eaves <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Jul 4, 2025

@engelmi PTAL

@@ -17,15 +17,18 @@ def __init__(self):
self.loaders = {ext.lower(): loader() for loader in self.get_loaders() for ext in loader.file_extensions()}

def _get_loader(self, file: str) -> base.BaseFileLoader:
return self.loaders[os.path.splitext(file)[1].lower()]
loader = self.loaders.get(os.path.splitext(file)[1].lower(), None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets create a file_type local variable for this to make it clearer.

file_type = os.path.splitext(file)[1].lower(

@rhatdan
Copy link
Member

rhatdan commented Jul 7, 2025

LGTM

@rhatdan rhatdan merged commit 21957b2 into containers:main Jul 7, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants