Added tests for Llama3.1-70B-BF16 on SM120 #4198

farazkh80 · 2025-05-09T22:14:42Z

[TRTLLM-4618][feat] Add E2E tests for Llama3.1-70B, Mixtral 8x7B on RTX6000 Pro (SM120) with FP16/FP8/NVFP4

Description

This PR adds tests to end-to-end tests for Llama3.1-70B at FP16, FP8 and NVFP4, Mixtral 8x7B FP4 to the TensorRT-LLM test suite to be run on SM120.
The tests will be used by QA as a part of the B40 Bring-up (RTX6000 Pro SM120) effort.

Test Coverage

Single node tests

test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-8B-BF16-llama-3.1-model/Meta-Llama-3.1-8B]
test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-8B-NVFP4-nvfp4-quantized/Meta-Llama-3.1-8B]
test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-8B-FP8-llama-3.1-model/Llama-3.1-8B-Instruct-FP8]
test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-70B-NVFP4-nvfp4-quantized/Meta-Llama-3.1-70B]
test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-70B-FP8-llama-3.1-model/Llama-3.1-70B-Instruct-FP8]
test_e2e.py::test_ptp_quickstart_advanced[Mixtral-8x7B-NVFP4-nvfp4-quantized/Mixtral-8x7B-Instruct-v0.1]

These tests will be included in the SM120 verification plan for QA sign-off.

Multi node tests

test_e2e.py::test_ptp_quickstart_advanced_2gpus_sm120[Llama3.1-70B-BF16-llama-3.1-model/Meta-Llama-3.1-70B]
test_e2e.py::test_ptp_quickstart_advanced_2gpus_sm120[Mixtral-8x7B-BF16-Mixtral-8x7B-Instruct-v0.1]

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

farazkh80 · 2025-05-12T18:19:33Z

/bot run

pamelap-nvidia · 2025-05-13T14:53:52Z

/bot run

pamelap-nvidia

LG! Once you update the test case, I can help trigger a bot run with rtx 6000.

tests/integration/defs/test_e2e.py

docker/common/install_tensorrt.sh

pamelap-nvidia · 2025-05-13T19:06:49Z

bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

pamelap-nvidia · 2025-05-13T19:17:17Z

/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

tensorrt-cicd · 2025-05-13T19:25:37Z

PR_Github #5057 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-13T23:38:37Z

PR_Github #5057 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3682 (Partly Tested) completed with status: 'SUCCESS'

pamelap-nvidia · 2025-05-14T00:47:09Z

/bot run

tensorrt-cicd · 2025-05-14T00:54:07Z

PR_Github #5071 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-14T04:06:05Z

PR_Github #5071 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3691 completed with status: 'SUCCESS'

Signed-off-by: Faraz Khoubsirat <[email protected]>

pamelap-nvidia · 2025-05-14T15:40:58Z

/bot reuse-pipeline

tensorrt-cicd · 2025-05-14T15:46:44Z

PR_Github #5197 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-05-14T15:53:48Z

PR_Github #5197 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #5071 for commit 048cdb6

farazkh80 force-pushed the b40_test_coverage branch from 7a6d196 to 1284049 Compare May 12, 2025 16:40

farazkh80 marked this pull request as ready for review May 12, 2025 18:01

pamelap-nvidia reviewed May 13, 2025

View reviewed changes

tests/integration/defs/test_e2e.py Outdated Show resolved Hide resolved

docker/common/install_tensorrt.sh Outdated Show resolved Hide resolved

farazkh80 force-pushed the b40_test_coverage branch from 549e544 to cc55603 Compare May 13, 2025 18:38

farazkh80 requested a review from pamelap-nvidia May 13, 2025 18:43

pamelap-nvidia approved these changes May 14, 2025

View reviewed changes

pamelap-nvidia requested a review from yuanjingx87 May 14, 2025 02:25

farazkh80 added 2 commits May 14, 2025 15:18

Added tests for Llama3.1-70B-BF16 on SM120

de4900b

Signed-off-by: Faraz Khoubsirat <[email protected]>

solve conflicts add more tests

048cdb6

Signed-off-by: Faraz Khoubsirat <[email protected]>

farazkh80 force-pushed the b40_test_coverage branch from 4cae7f7 to 048cdb6 Compare May 14, 2025 15:24

pamelap-nvidia merged commit 42de79d into NVIDIA:main May 14, 2025
3 checks passed

Added tests for Llama3.1-70B-BF16 on SM120 #4198

Added tests for Llama3.1-70B-BF16 on SM120 #4198

Uh oh!

Conversation

farazkh80 commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[TRTLLM-4618][feat] Add E2E tests for Llama3.1-70B, Mixtral 8x7B on RTX6000 Pro (SM120) with FP16/FP8/NVFP4

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

farazkh80 commented May 12, 2025

Uh oh!

pamelap-nvidia commented May 13, 2025

Uh oh!

pamelap-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pamelap-nvidia commented May 13, 2025

Uh oh!

pamelap-nvidia commented May 13, 2025

Uh oh!

tensorrt-cicd commented May 13, 2025

Uh oh!

tensorrt-cicd commented May 13, 2025

Uh oh!

pamelap-nvidia commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

pamelap-nvidia commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

farazkh80 commented May 9, 2025 •

edited

Loading