Skip to content

Conversation

baptistecolle
Copy link
Collaborator

@baptistecolle baptistecolle commented Apr 10, 2025

What does this PR do?

This PR adds CI support for the Gaudi backend. It includes an integration test that starts the model "meta-llama/Llama-3.1-8B-Instruct", performs a few requests, and verifies that the outputs match the expected results.

Additional models are also supported, but running tests for all of them is quite slow, so they are not included in the CI by default. However, instructions on how to run the integration tests for all supported models have been added to the Gaudi backend README.

@baptistecolle
Copy link
Collaborator Author

baptistecolle commented Apr 22, 2025

I’ll wait for the Gaudi integration test CI to pass before merging anything:
https://github.com/huggingface/text-generation-inference/actions/runs/14591230970/job/40927197928?pr=3160

The previous run was green, which gives me confidence in the current changes:
https://github.com/huggingface/text-generation-inference/actions/runs/14384130453/job/40336095297

Unfortunately, it can take days to get assigned a Gaudi1 runner 😭, so I figured I could start iterating on your reviews in the meantime rather than wait for the CI to finish before requesting feedback. In any case, I’ll only merge once the Gaudi integration test passes in the CI also

@baptistecolle baptistecolle marked this pull request as ready for review April 22, 2025 10:01
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

We should soon have access to Gaudi2 and Gaudi3 ephemeral runners on demand, which will makes things much easier than waiting for a DL1 instance. I suggest we wait for this to be available to update and merge this PR.

@baptistecolle
Copy link
Collaborator Author

Ok, I will wait for the new runners before adding Gaudi to the CI, as indeed the DL1 runners are super unreliable

@baptistecolle baptistecolle marked this pull request as draft April 23, 2025 07:42
Narsil
Narsil previously approved these changes Apr 23, 2025
Copy link
Collaborator

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@baptistecolle
Copy link
Collaborator Author

The runners for Gaudi are ready! 🙌 Thanks @regisss

Just requesting some new reviews to be sure everything is still okay. Since the last review I just rebased on main and use the new runners. Now the integration test are passing and the runners are super fast! https://github.com/huggingface/text-generation-inference/actions/runs/15160963395/job/42627380206?pr=3160

@baptistecolle baptistecolle marked this pull request as ready for review May 21, 2025 11:49
@@ -129,9 +129,9 @@ jobs:
export label_extension="-gaudi"
export docker_volume="/mnt/cache"
export docker_devices=""
export runs_on="ubuntu-latest"
export runs_on="itac-bm-emr-gaudi3-dell-1gaudi"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests are going to pass with 1 device only? Big (i.e. 70B+ parameters) models are not tested?

Copy link
Collaborator Author

@baptistecolle baptistecolle May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I disable big models for testing and only did a small model for faster iteration. I just activated back a mutli-card test and the test is broken 😬. There seems to be a regression between the original PR and the latest TGI backend so I am looking into it 👀. Also the error is different based on the hardware Gaudi 1 vs 3 😣

@regisss
Copy link
Collaborator

regisss commented May 22, 2025

@baptistecolle A couple of questions:

  • It's not possible to select a specific runner for each test config right?
  • If I want to add a new model to test, I just need to add a new test config in test_gaudi_generate.py?

@baptistecolle
Copy link
Collaborator Author

baptistecolle commented May 22, 2025

@baptistecolle A couple of questions:

  • It's not possible to select a specific runner for each test config right?
  • If I want to add a new model to test, I just need to add a new test config in test_gaudi_generate.py?
  1. No it is not i think this would require some rework of the build workflow which is global for all the hardwares. The best alternative would be to use a runner with 8 card and then set HABANA_VISIBLE_DEVICES=1
  2. Yes, that's correct.

Some additional useful remark: you also need to add the new config with "run_by_default": True

to run in the CI as there a lot of test, for faster CI testing I only run a subset of the test on the CI and not all the possible model we support

@baptistecolle baptistecolle marked this pull request as draft May 22, 2025 07:24
@baptistecolle baptistecolle marked this pull request as ready for review June 23, 2025 14:15
@baptistecolle baptistecolle requested a review from regisss June 23, 2025 14:15
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just left a couple of comments regarding the runner size.
I'll add more tests later anyway for Llama4 and R1 on 8 devices.

Do you know if there is a nightly CI in TGI?

export platform=""
export extra_pytest=""
export extra_pytest="--gaudi"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will run the models with "run_by_default": True in PRs right?
If yes, I think we should change the runner above from itac-bm-emr-gaudi3-dell-8gaudi to itac-bm-emr-gaudi3-dell-2gaudi so that we test Llama 8b on a single device and on 2 devices.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is correct "run_by_default": True make the test run on the CI.

The CI follow this commands
make -C backends/gaudi run-integration-tests which run all the "run_by_default": True tests
There is also this command
make -C backends/gaudi run-integration-tests-with-all-models that runs all the model config definied in the test cases. This is useful when doing a big refactoring and checking everything is still working as expected

@baptistecolle
Copy link
Collaborator Author

baptistecolle commented Jun 24, 2025

I updated the CI to a smaller runner so that I do testing on a 1 Gaudi card and a test on 2 Gaudi cards to test the model sharding logic

To answer your question

Do you know if there is a nightly CI in TGI?
No there is not but because there is active developement, the Gaudi CI should run daily. Any changes to main will rebuild the gaudi image and test it (this is the same for all the TGI variants)

@baptistecolle baptistecolle requested a review from regisss June 24, 2025 08:05
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@regisss regisss merged commit 9f38d93 into main Jun 24, 2025
32 of 33 checks passed
@regisss regisss deleted the gaudi/add-ci branch June 24, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants