TMT: run tests with GPUs #1101

lsm5 · 2025-04-02T12:02:20Z

This commit adds TMT test jobs triggered via Packit that fetches an instance with NVIDIA GPU, specified in plans/no-rpm.fmf, and can be verified in the gpu_info test result.

In addition, system tests (nocontainer), validate, and unit tests are also triggered via TMT.

Fixes: #1054

TODO:

Enable bats-docker tests
Resolve f41 validate test failures

Summary by Sourcery

Tests:

Update test metadata configuration to enable GPU-based test execution

Summary by Sourcery

Enable GPU-accelerated and comprehensive TMT-based test workflows via Packit and new FMF plans, updating configuration and test scripts to support the enhanced testing pipeline.

New Features:

Add TMT-driven GPU tests via Packit using new /plans/rpm and /plans/no-rpm FMF plans for Fedora and CentOS
Provide a bats-tests.sh script to manually run docker or nocontainer bats tests under TMT

Enhancements:

Consolidate Fedora Copr build targets into fedora-all in .packit.yaml
Standardize container build path resolution in container_build.sh

CI:

Configure Packit jobs to trigger TMT plans for system, validate, and unit tests

Tests:

Mock versioned image lookup in test_accel_image for stable unit testing
Update system help test to handle both rootless and root default store paths
Add FMF plan files to register TMT-based system and unit test runs

sourcery-ai · 2025-04-02T12:02:30Z

Reviewer's Guide

This PR configures Packit to trigger TMT tests on NVIDIA GPU instances by adding dedicated RPM and no-RPM jobs, updates unit and system tests for deterministic behavior, fixes container build paths, and provides a TMT orchestration script alongside FMF plans for automated test runs.

Class diagram for new and updated TMT test job configuration

classDiagram
    class PackitJob {
        +string job
        +string trigger
        +list packages
        +list targets
        +string tmt_plan
        +string identifier
        +bool skip_build
    }
    class FMFPlan {
        +string name
        +list tests
        +string hardware_requirements
    }
    PackitJob "*" -- "*" FMFPlan : uses

    class GPUInstance {
        +string type
        +string vendor
    }
    FMFPlan "1" -- "*" GPUInstance : requests

    %% Highlight new/modified jobs
    class PackitJob {
        <<new/modified>>
    }
    class FMFPlan {
        <<new/modified>>
    }

File-Level Changes

Change	Details	Files
Updated Packit CI configuration to define TMT test jobs with GPU support	Replaced multiple Fedora targets with 'fedora-all' Added 'tests' jobs with tmt_plan and identifiers for rpm/no-rpm scenarios	`.packit.yaml`
Isolated GPU image logic in unit tests by mocking version checks	Decorated test_accel_image with patch to force versioned image fallback Mocked attempt_to_use_versioned to False for independent testing	`test/unit/test_common.py`
Enhanced system test to handle rootless vs. non-rootless environments	Wrapped default store assertion with rootless conditional Fallback to '/var/lib/ramalama' when not rootless	`test/system/015-help.bats`
Fixed container build script path resolution	Prefixed container-images path with './' in build loop	`container_build.sh`
Added TMT orchestration script and FMF plans for rpm/no-rpm workflows	Introduced bats-tests.sh for manual docker/nocontainer TMT runs Added FMF plans for rpm and no-rpm testing	`test/tmt/bats-tests.sh` `plans/no-rpm.fmf` `plans/rpm.fmf` `test/tmt/no-rpm.fmf`

Possibly linked issues

GPU testing with TMT #1054: The PR adds TMT test jobs and configuration to run tests with GPUs, directly addressing the issue's goal.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

lsm5 · 2025-04-02T14:36:45Z

@ericcurtin @rhatdan we're able to access gpu instances via TMT and that can be verified through the TMT log. See /test/tmt/gpu_info results in the tmt log. Check Show passed tests to see it.

But, make tests is failing on not finding llama-run and llama-bench commands. How do I get those?

rhatdan · 2025-04-18T15:46:11Z

Look at:

.github/workflows/ci.yml: sudo ./container-images/scripts/build_llama_and_whisper.sh

This builds the released version of llama.cpp and whisper.cpp and installs them in the host or in a container.

lsm5 · 2025-06-18T13:17:36Z

I'm now seeing these 2 errors in bats-nocontainer:

not ok 11 [015] ramalama verify default store in 552ms
# (from function `bail-now' in file test/system/helpers.podman.bash, line 122,
#  from function `is' in file test/system/helpers.podman.bash, line 1016,
#  in test file test/system/015-help.bats, line 174)
#   `is "$output" ".*default: ${HOME}/.local/share/ramalama"  "Verify default store"' failed
#
# [13:04:52.219867373] # /var/ARTIFACTS/work-bats-nocontainerd837lq0d/plans/bats-nocontainer/tree/bin/ramalama --help
# [13:04:52.429558178] usage: ramalama [-h] [--container] [--debug] [--dryrun] [--engine ENGINE]
#                 [--image IMAGE] [--keep-groups] [--nocontainer] [--quiet]
#                 [--runtime {llama.cpp,vllm}] [--store STORE]
#                 [--use-model-store]
#                 {bench,benchmark,chat,client,containers,ps,convert,help,info,inspect,list,ls,login,logout,perplexity,pull,push,rag,rm,run,serve,stop,version} ...

----snip ramalama command output----

# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: Verify default store
# #| expected: '.*default: /root/.local/share/ramalama' (using expr)
# #|   actual: 'usage: ramalama [-h] [--container] [--debug] [--dryrun] [--engine ENGINE]'

and this as well (looks like issues accessing the url).

not ok 37 [050] ramalama pull huggingface in 9460ms
# tags: distro-integration
# (from function `bail-now' in file test/system/helpers.podman.bash, line 122,
#  from function `die' in file test/system/helpers.podman.bash, line 848,
#  from function `run_ramalama' in file test/system/helpers.bash, line 186,
#  in test file test/system/050-pull.bats, line 80)
#   `run_ramalama pull hf://TinyLlama/TinyLlama-1.1B-Chat-v1.0' failed
#
# [13:08:45.808278621] # /var/ARTIFACTS/work-bats-nocontainerd837lq0d/plans/bats-nocontainer/tree/bin/ramalama pull hf://Felladrin/gguf-smollm-360M-instruct-add-basics/smollm-360M-instruct-add-basics.IQ2_XXS.gguf
# [13:08:48.288509401] Downloading huggingface://Felladrin/gguf-smollm-360M-instruct-add-basics/smollm-360M-instruct-add-basics.IQ2_XXS.gguf:latest ...
# Trying to pull huggingface://Felladrin/gguf-smollm-360M-instruct-add-basics/smollm-360M-instruct-add-basics.IQ2_XXS.gguf:latest ...

 ---- snip similar looking messages----

# [13:08:54.407791981] NAME                                                                                             MODIFIED     SIZE
# hf://Felladrin/gguf-smollm-360M-instruct-add-basics/smollm-360M-instruct-add-basics.IQ2_XXS.gguf 1 second ago 196.31 MB
#
# [13:08:54.424280234] # /var/ARTIFACTS/work-bats-nocontainerd837lq0d/plans/bats-nocontainer/tree/bin/ramalama rm huggingface://Felladrin/gguf-smollm-360M-instruct-add-basics/smollm-360M-instruct-add-basics.IQ2_XXS.gguf
#
# [13:08:54.710105260] # /var/ARTIFACTS/work-bats-nocontainerd837lq0d/plans/bats-nocontainer/tree/bin/ramalama pull hf://TinyLlama/TinyLlama-1.1B-Chat-v1.0
# [13:08:55.018142961] Downloading huggingface://TinyLlama/TinyLlama-1.1B-Chat-v1.0:latest ...
# Trying to pull huggingface://TinyLlama/TinyLlama-1.1B-Chat-v1.0:latest ...
# URL pull failed and huggingface-cli not available
# Error: Failed to pull model: HTTP Error 400: Bad Request
# [13:08:55.022949511] [ rc=1 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 1; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The unit tests are failing on

>                   assert accel_image(config) == expected_result
E                   AssertionError: assert 'quay.io/ramalama/rocm:0.9' == 'quay.io/ramalama/rocm:latest'
E                     
E                     - quay.io/ramalama/rocm:latest
E                     ?                       ^^^^^^
E                     + quay.io/ramalama/rocm:0.9
E                     ?                       ^^^

See the detailed logs at: https://artifacts.dev.testing-farm.io/74a1da74-2417-4d94-ab38-e067214441d5/

lsm5 · 2025-06-18T14:40:19Z

i see one issue was no python3-huggingface-hub installed.

Signed-off-by: Sergio Arroutbi <[email protected]>

For the rootful case, the default store is at /var/lib/ramalama. Signed-off-by: Lokesh Mandvekar <[email protected]>

This commit adds TMT test jobs triggered via Packit that fetches an instance with NVIDIA GPU, specified in `plans/no-rpm.fmf`, and can be verified in the gpu_info test result. In addition, system tests (nocontainer), validate, and unit tests are also triggered via TMT. Fixes: containers#1054 TODO: 1. Enable bats-docker tests 2. Resolve f41 validate test failures Signed-off-by: Lokesh Mandvekar <[email protected]>

sourcery-ai

Hey @lsm5 - I've reviewed your changes - here's some feedback:

plans/no-rpm.fmf and plans/rpm.fmf are added but empty—please populate them with the FMF metadata needed for TMT to pick up those test plans.
Replacing the two Fedora targets with fedora-all may pull in unintended variants—please verify that it matches the original scope of development and latest-stable.
The new bats-tests.sh script duplicates existing CI orchestration logic—consider reusing or refactoring current CI scripts to avoid maintaining parallel test runners.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- plans/no-rpm.fmf and plans/rpm.fmf are added but empty—please populate them with the FMF metadata needed for TMT to pick up those test plans.
- Replacing the two Fedora targets with `fedora-all` may pull in unintended variants—please verify that it matches the original scope of development and latest-stable.
- The new bats-tests.sh script duplicates existing CI orchestration logic—consider reusing or refactoring current CI scripts to avoid maintaining parallel test runners.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

lsm5 · 2025-06-19T19:59:02Z

plans/no-rpm.fmf and plans/rpm.fmf are added but empty—please populate them with the FMF metadata needed for TMT to pick up those test plans.

They are not empty.

Replacing the two Fedora targets with fedora-all may pull in unintended variants—please verify that it matches the original scope of development and latest-stable.

This was intended because validate test breaks on F41.

The new bats-tests.sh script duplicates existing CI orchestration logic—consider reusing or refactoring current CI scripts to avoid maintaining parallel test runners.

needed for TMT tests such that the scripts can also be run locally without any TMT environment. Ideally this config should live inside Makefile, but that can be for later.

lsm5 · 2025-06-19T19:59:11Z

@ericcurtin @rhatdan @smooge PTAL. There's one commit from @sarroutbi from #1567 as well to fix a unit test issue.

rhatdan · 2025-06-20T10:40:31Z

LGTM

lsm5 force-pushed the tmt-gpu branch 7 times, most recently from 8e6be74 to 8b8828a Compare April 2, 2025 14:08

lsm5 linked an issue Apr 2, 2025 that may be closed by this pull request

GPU testing with TMT #1054

Closed

lsm5 force-pushed the tmt-gpu branch from 8b8828a to f3e571c Compare April 18, 2025 15:23

lsm5 force-pushed the tmt-gpu branch 2 times, most recently from 1b9f459 to eaa58c5 Compare April 23, 2025 10:33

lsm5 force-pushed the tmt-gpu branch from eaa58c5 to ed6cbf1 Compare June 2, 2025 14:55

lsm5 force-pushed the tmt-gpu branch 4 times, most recently from 54cc323 to 5116408 Compare June 18, 2025 12:40

lsm5 force-pushed the tmt-gpu branch from 5116408 to d80119a Compare June 18, 2025 17:59

Fix test_accel unit test to fallback to latest

628b723

Signed-off-by: Sergio Arroutbi <[email protected]>

lsm5 force-pushed the tmt-gpu branch 7 times, most recently from a342d35 to 42d7a7e Compare June 18, 2025 19:22

lsm5 force-pushed the tmt-gpu branch from 42d7a7e to 6f5bf08 Compare June 19, 2025 13:56

System tests: account for rootful default store

66f7c0d

For the rootful case, the default store is at /var/lib/ramalama. Signed-off-by: Lokesh Mandvekar <[email protected]>

lsm5 force-pushed the tmt-gpu branch from 6f5bf08 to 72c249e Compare June 19, 2025 14:18

lsm5 mentioned this pull request Jun 19, 2025

Fix test_accel unit test to fallback to latest #1567

Merged

lsm5 force-pushed the tmt-gpu branch 2 times, most recently from b8a38e7 to 6d49d41 Compare June 19, 2025 18:39

lsm5 force-pushed the tmt-gpu branch from 6d49d41 to a53c427 Compare June 19, 2025 19:33

lsm5 marked this pull request as ready for review June 19, 2025 19:55

lsm5 requested review from rhatdan, ericcurtin, bmahabirbu, maxamillion, swarajpande5, jhjaggars, cgruver, slp and engelmi as code owners June 19, 2025 19:55

sourcery-ai bot approved these changes Jun 19, 2025

View reviewed changes

rhatdan merged commit 3f87444 into containers:main Jun 20, 2025
20 of 21 checks passed

lsm5 deleted the tmt-gpu branch June 20, 2025 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TMT: run tests with GPUs #1101

TMT: run tests with GPUs #1101

Uh oh!

lsm5 commented Apr 2, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Apr 2, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

lsm5 commented Apr 2, 2025

Uh oh!

rhatdan commented Apr 18, 2025

Uh oh!

lsm5 commented Jun 18, 2025

Uh oh!

lsm5 commented Jun 18, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

lsm5 commented Jun 19, 2025

Uh oh!

lsm5 commented Jun 19, 2025

Uh oh!

rhatdan commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

TMT: run tests with GPUs #1101

TMT: run tests with GPUs #1101

Uh oh!

Conversation

lsm5 commented Apr 2, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for new and updated TMT test job configuration

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

lsm5 commented Apr 2, 2025

Uh oh!

rhatdan commented Apr 18, 2025

Uh oh!

lsm5 commented Jun 18, 2025

Uh oh!

lsm5 commented Jun 18, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

lsm5 commented Jun 19, 2025

Uh oh!

lsm5 commented Jun 19, 2025

Uh oh!

rhatdan commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

lsm5 commented Apr 2, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Apr 2, 2025 •

edited

Loading