[AutoDeploy] configurable cache resize #4372

lucaslie · 2025-05-15T22:40:15Z

Configure cache resize so we can turn it down when not needed

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie · 2025-05-15T22:40:31Z

/bot run

Copilot

Pull Request Overview

This PR makes cache resize configurable by introducing a new free_mem_ratio option and updates various tests and components to leverage this setting.

Updates unit and integration tests to pass a modified free_mem_ratio, reducing memory occupation to mitigate OOM issues.
Updates the transformation and shim modules to use free_mem_ratio from the configuration.
Adjusts example scripts to propagate the free_mem_ratio parameter to the AutoDeploy component.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py	Sets free_mem_ratio to 0.01 and adds a debug print for SimpleConfig.
tests/unittest/_torch/auto_deploy/unit/multigpu/test_ad_build_small_multi.py	Sets free_mem_ratio to 0.01 and adds a debug print for SimpleConfig.
tests/unittest/_torch/auto_deploy/integration/test_ad_build.py	Updates configuration by setting free_mem_ratio before running main().
tensorrt_llm/_torch/auto_deploy/transformations/transform.py	Uses free_mem_ratio from ad_config in the resize_kv_cache call.
tensorrt_llm/_torch/auto_deploy/shim/interface.py	Adds free_mem_ratio field to AutoDeployConfig.
examples/auto_deploy/simple_config.py	Adds free_mem_ratio field to SimpleConfig.
examples/auto_deploy/build_and_run_ad.py	Passes free_mem_ratio from SimpleConfig to the build function.

Comments suppressed due to low confidence (2)

tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py:77

Document why the free_mem_ratio is set to 0.01 in tests, which is significantly different from the default (0.8) in production configurations, to ensure clarity for future maintainers.

simple_config.free_mem_ratio = 0.01  // we don't need the cache and it may cause OOM issues

tests/unittest/_torch/auto_deploy/unit/multigpu/test_ad_build_small_multi.py:63

Document why the free_mem_ratio is set to 0.01 in tests, which is significantly different from the default (0.8) in production configurations, to ensure clarity for future maintainers.

simple_config.free_mem_ratio = 0.01  // we don't need the cache and it may cause OOM issues

tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py

tests/unittest/_torch/auto_deploy/unit/multigpu/test_ad_build_small_multi.py

tensorrt-cicd · 2025-05-15T22:46:20Z

PR_Github #5414 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-16T01:30:58Z

PR_Github #5414 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3951 completed with status: 'SUCCESS'

[AutoDeploy] configurable cache resize

a31c48a

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie requested review from Fridah-nv, Copilot, kaiyux, sugunav14 and suyoggupta May 15, 2025 22:40

lucaslie self-assigned this May 15, 2025

github-project-automation bot added this to AutoDeploy Board May 15, 2025

github-project-automation bot moved this to Backlog in AutoDeploy Board May 15, 2025

lucaslie removed the request for review from kaiyux May 15, 2025 22:40

lucaslie moved this from Backlog to In review in AutoDeploy Board May 15, 2025

Copilot AI reviewed May 15, 2025

View reviewed changes

tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py Show resolved Hide resolved

tests/unittest/_torch/auto_deploy/unit/multigpu/test_ad_build_small_multi.py Show resolved Hide resolved

lucaslie added the AutoDeploy <NV> AutoDeploy Backend label May 15, 2025

suyoggupta approved these changes May 15, 2025

View reviewed changes

lucaslie merged commit 8e4320e into NVIDIA:main May 16, 2025
3 checks passed

github-project-automation bot moved this from In review to Done in AutoDeploy Board May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoDeploy] configurable cache resize #4372

[AutoDeploy] configurable cache resize #4372

Uh oh!

lucaslie commented May 15, 2025

Uh oh!

lucaslie commented May 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AutoDeploy] configurable cache resize #4372

[AutoDeploy] configurable cache resize #4372

Uh oh!

Conversation

lucaslie commented May 15, 2025

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

lucaslie commented May 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants