Skip to content

[None][doc] Move AutoDeploy README.md to torch docs #6528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 8, 2025

Conversation

Fridah-nv
Copy link
Collaborator

@Fridah-nv Fridah-nv commented Jul 31, 2025

Summary by CodeRabbit

  • Documentation
    • Introduced comprehensive documentation for the new experimental "AutoDeploy" feature, covering its overview, installation, usage, and advanced configuration.
    • Added detailed guides on building and running AutoDeploy examples, expert-level configuration options, workflow integration, and logging configuration.
    • Provided a support matrix outlining compatible models, runtimes, backends, and precision formats.
    • Included instructions for configuring logging and offered example commands for various deployment scenarios.
    • Added a new section on known issues highlighting the experimental nature of AutoDeploy.
    • Added benchmarking documentation for measuring AutoDeploy model performance using the trtllm-bench utility.

Description

built the doc and tested locally.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Copy link
Contributor

coderabbitai bot commented Jul 31, 2025

Caution

Review failed

Failed to post review comments.

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55157c5 and ac07367.

📒 Files selected for processing (8)
  • docs/source/torch.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/example_run.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/expert_configurations.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/logging.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/workflow.md (1 hunks)
  • docs/source/torch/auto_deploy/auto-deploy.md (1 hunks)
  • docs/source/torch/auto_deploy/support_matrix.md (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • docs/source/torch/auto_deploy/advanced/example_run.md
  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • docs/source/torch.md
  • docs/source/torch/auto_deploy/advanced/logging.md
  • docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md
  • docs/source/torch/auto_deploy/support_matrix.md
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/workflow.md
  • docs/source/torch/auto_deploy/auto-deploy.md
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/workflow.md
  • docs/source/torch/auto_deploy/auto-deploy.md
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/workflow.md
  • docs/source/torch/auto_deploy/auto-deploy.md
📚 Learning: 2025-07-22T09:22:14.726Z
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

  • docs/source/torch/auto_deploy/auto-deploy.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/workflow.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/source/torch/auto_deploy/auto-deploy.md

7-7: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)


16-16: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

🔇 Additional comments (1)
docs/source/torch/auto_deploy/advanced/workflow.md (1)

7-26: No changes required: example imports and kwargs align with current API

Verified that:

  • LLM is exported from tensorrt_llm._torch.auto_deploy
  • All constructor kwargs in the snippet (model, world_size, compile_backend, model_kwargs, attn_backend, attn_page_size, skip_loading_weights, model_factory, mla_backend, free_mem_ratio, simple_shard_only, max_seq_len, max_batch_size) exactly match the fields on AutoDeployConfig/LlmArgs.

No updates are needed.

📝 Walkthrough

Walkthrough

This update introduces comprehensive documentation for the new experimental "AutoDeploy" feature, which enables seamless deployment of PyTorch models to TRT-LLM. The changes add multiple markdown files covering installation, usage, advanced configuration, logging, workflow integration, benchmarking, and a detailed support matrix. No code or public API changes are included.

Changes

Cohort / File(s) Change Summary
Main AutoDeploy Documentation
docs/source/torch/auto_deploy/auto-deploy.md
Adds a comprehensive overview and user guide for the experimental AutoDeploy feature, describing its motivation, features, installation, usage, and links to further documentation.
Support Matrix
docs/source/torch/auto_deploy/support_matrix.md
Introduces a support matrix detailing supported models, runtimes, compile backends, attention backends, and precision formats for AutoDeploy, including a summary of the end-to-end deployment workflow.
Advanced Usage & Configuration
docs/source/torch/auto_deploy/advanced/expert_configurations.md, docs/source/torch/auto_deploy/advanced/example_run.md, docs/source/torch/auto_deploy/advanced/logging.md, docs/source/torch/auto_deploy/advanced/workflow.md, docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md
Adds new documentation files covering advanced configuration (expert options, YAML/CLI merging, precedence), example commands for running AutoDeploy, logging configuration, integration of AutoDeploy into custom workflows with code examples, and benchmarking AutoDeploy models using the trtllm-bench utility with configuration details and performance tips.
General Documentation Update
docs/source/torch.md
Updates the main documentation to reference the new experimental AutoDeploy feature under the "Known Issues" section, with a link to its documentation.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Documentation
    User->>Documentation: Reads AutoDeploy overview
    User->>Documentation: Consults support matrix for model/backend compatibility
    User->>Documentation: Reviews advanced configuration and example usage
    User->>Documentation: Learns about logging, workflow integration, and benchmarking
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

Documentation

Suggested reviewers

  • litaotju
  • kaiyux
  • nv-guomingz
  • Shixiaowei02

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (8)
docs/source/auto_deploy/advanced/logging.md (2)

3-4: Minor wording / punctuation nit
The semicolon after “verbosity” reads oddly. A colon works better:

-Use the following env variable to specify the logging level of our built-in logger ordered by
-decreasing verbosity;
+Use the following environment variable to specify the logging level of the built-in logger, ordered by
+decreasing verbosity:

6-12: Reduce repetition – show level placeholder once
Five identical assignments suggest users need five exports; actually only one is needed. Consider:

-```bash
-AUTO_DEPLOY_LOG_LEVEL=DEBUG
-AUTO_DEPLOY_LOG_LEVEL=INFO
-AUTO_DEPLOY_LOG_LEVEL=WARNING
-AUTO_DEPLOY_LOG_LEVEL=ERROR
-AUTO_DEPLOY_LOG_LEVEL=INTERNAL_ERROR
-```
+```bash
+# Choose one of: DEBUG | INFO | WARNING | ERROR | INTERNAL_ERROR
+export AUTO_DEPLOY_LOG_LEVEL=INFO
+```

This keeps the doc concise and avoids the impression multiple exports are required.

docs/source/auto_deploy/advanced/mixed_precision_quantization.md (1)

14-17: Ensure CLI flag syntax consistency (--arg=value).

Most examples in the AutoDeploy docs use the --key=value style. Here a space is used, which can break parsing for many CLI frameworks.

-python build_and_run_ad.py --model "<MODELOPT_CKPT_PATH>" --args.world-size 1
+python build_and_run_ad.py --model "<MODELOPT_CKPT_PATH>" --args.world-size=1
docs/source/auto-deploy.md (2)

8-10: Replace raw HTML header with a Markdown heading.

Sphinx + MyST renders standard Markdown headings more reliably than embedded HTML.

-<h4> Seamless Model Deployment from PyTorch to TRT-LLM</h4>
+#### Seamless Model Deployment from PyTorch to TRT-LLM

41-43: Fix mixed-case “LLama” spelling.

The line uses “LLama”, whereas the project consistently uses “LLaMA”.

-You are ready to run an in-framework LLama Demo now.
+You are ready to run an in-framework LLaMA demo now.
docs/source/auto_deploy/advanced/expert_configurations.md (1)

6-16: Unify emphasis style to satisfy markdown-lint (MD049).

The file mixes underscores and asterisks for emphasis; the linter expects asterisks.

-_exclusively_
+*exclusively*

(Apply similarly to other occurrences.)

docs/source/auto_deploy/advanced/model_eval.md (2)

3-3: Add direct reference & correct capitalization for LM Evaluation Harness.

Consider linking to the official repo (https://github.com/EleutherAI/lm-evaluation-harness) and capitalising the tool’s name for consistency with other docs.


6-7: Clarify the “model is defined the same as above” cross-reference.

This file is self-contained; readers landing here directly won’t know which “above” section you mean. Either include the earlier snippet inline or link explicitly to the relevant doc section.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d5da9f and 0d54263.

📒 Files selected for processing (9)
  • docs/source/auto-deploy.md (1 hunks)
  • docs/source/auto_deploy/advanced/example_run.md (1 hunks)
  • docs/source/auto_deploy/advanced/expert_configurations.md (1 hunks)
  • docs/source/auto_deploy/advanced/logging.md (1 hunks)
  • docs/source/auto_deploy/advanced/mixed_precision_quantization.md (1 hunks)
  • docs/source/auto_deploy/advanced/model_eval.md (1 hunks)
  • docs/source/auto_deploy/advanced/workflow.md (1 hunks)
  • docs/source/auto_deploy/support_matrix.md (1 hunks)
  • docs/source/index.rst (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • docs/source/auto_deploy/advanced/workflow.md
  • docs/source/auto_deploy/support_matrix.md
  • docs/source/auto-deploy.md
  • docs/source/auto_deploy/advanced/expert_configurations.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/auto_deploy/advanced/workflow.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/source/auto_deploy/advanced/expert_configurations.md

7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)

🔇 Additional comments (3)
docs/source/auto_deploy/advanced/workflow.md (1)

30-32: Hard-coded relative links will not resolve in published docs
../../tensorrt_llm/_torch/auto_deploy/llm.py points outside docs/ and Sphinx cannot copy that file.

Use one of:
• A GitHub permalink, e.g. https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/auto_deploy/llm.py
• Or, if API docs are generated, cross-link with :py:class:tensorrt_llm._torch.auto_deploy.LLM``.

Broken links will fail the linkcheck job.

docs/source/auto_deploy/advanced/example_run.md (1)

26-29: Spelling / consistency of runtime value
--args.runtime "demollm" looks suspicious – elsewhere in the project the engine is spelled demo_llm or demollm? Please verify the exact token; a typo will cause argument-parsing errors at runtime.

docs/source/auto_deploy/advanced/model_eval.md (1)

6-11: Quote --model_args to avoid shell-level splitting.

Without quotes, Bash treats the comma as an argument separator, so only model=meta-llama/Meta-Llama-3.1-8B-Instruct is passed to --model_args and world_size=2 becomes a standalone positional arg, causing the script to fail.

-python lm_eval_ad.py \
---model autodeploy --model_args model=meta-llama/Meta-Llama-3.1-8B-Instruct,world_size=2 --tasks mmlu
+python lm_eval_ad.py \
+  --model autodeploy \
+  --model_args "model=meta-llama/Meta-Llama-3.1-8B-Instruct,world_size=2" \
+  --tasks mmlu
⛔ Skipped due to learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

@Fridah-nv Fridah-nv changed the title [AutoDeploy]: Move AutoDeploy README.md to torch docs doc:[AutoDeploy]: Move AutoDeploy README.md to torch docs Jul 31, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)

6-12: Broken links & inconsistent emphasis; switch to Sphinx cross-refs

  • AutoDeployConfig links back to llm_args.py rather than the actual config module.
  • _exclusively_ / _ignored_ use underscore emphasis while the rest of the doc uses asterisks – flagged by MD049.
  • Direct source-file links are fragile in rendered docs; prefer :py:class: or an absolute GitHub URL.
-  _exclusively_ exposed in the [`AutoDeployConfig` class](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py).
+  *exclusively* exposed in the :py:class:`tensorrt_llm._torch.auto_deploy.auto_deploy_config.AutoDeployConfig`.

...
-  object are overlapping, duplicated, and/or _ignored_ in AutoDeploy
+  object are overlapping, duplicated, and/or *ignored* in AutoDeploy
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d54263 and c1e59fd.

📒 Files selected for processing (7)
  • docs/source/torch.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/example_run.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/expert_configurations.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/logging.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/workflow.md (1 hunks)
  • docs/source/torch/auto_deploy/auto-deploy.md (1 hunks)
  • docs/source/torch/auto_deploy/support_matrix.md (1 hunks)
✅ Files skipped from review due to trivial changes (6)
  • docs/source/torch/auto_deploy/advanced/logging.md
  • docs/source/torch.md
  • docs/source/torch/auto_deploy/advanced/example_run.md
  • docs/source/torch/auto_deploy/advanced/workflow.md
  • docs/source/torch/auto_deploy/auto-deploy.md
  • docs/source/torch/auto_deploy/support_matrix.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md

7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)

🔇 Additional comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)

170-176: Confirm default config path is correct

cat tensorrt_llm/_torch/auto_deploy/config/default.yaml assumes the file lives inside _torch/auto_deploy/config/.
In the current tree the directory is usually .../auto_deploy/configs/ (plural). Please verify the exact location or update the path; otherwise the example will mislead users.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)

21-27: Fix “PyDantic” typo and broken llm_args link
Same issue flagged previously; still unfixed.

-… through a flexible argument parser powered by PyDantic Settings and OmegaConf.
+… through a flexible argument parser powered by Pydantic Settings and OmegaConf.

-… in `tensorrt_llm/_torch/auto_deploy/llm_args`.
+… in `tensorrt_llm/_torch/auto_deploy/llm_args.py`.
🧹 Nitpick comments (1)
docs/source/torch/auto_deploy/advanced/workflow.md (1)

7-28: Add a language spec to the fenced code block

Missing a language identifier violates MD040 and prevents syntax highlighting.

-```
+```python
   from tensorrt_llm._torch.auto_deploy import LLM
   ...
 llm = LLM(
@@
 )
-```
+```  # closing fence
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80dfcb5 and ed9f75a.

📒 Files selected for processing (4)
  • docs/source/torch/auto_deploy/advanced/example_run.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/expert_configurations.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/workflow.md (1 hunks)
  • docs/source/torch/auto_deploy/auto-deploy.md (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • docs/source/torch/auto_deploy/advanced/example_run.md
  • docs/source/torch/auto_deploy/auto-deploy.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
  • docs/source/torch/auto_deploy/advanced/workflow.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md

7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)

docs/source/torch/auto_deploy/advanced/workflow.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@Fridah-nv Fridah-nv force-pushed the user/fridah/ad-doc branch from ed9f75a to 5361c63 Compare August 1, 2025 00:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (2)

21-21: “PyDantic” → “Pydantic” (typo still present)

The same typo was flagged in a previous review but remains unfixed.

-argument parser powered by PyDantic Settings
+argument parser powered by Pydantic Settings

21-21: Fix 404-prone link – missing .py extension

The LlmArgs link ends without .py, producing a 404.

-.../tensorrt_llm/_torch/auto_deploy/llm_args)
+.../tensorrt_llm/_torch/auto_deploy/llm_args.py)
🧹 Nitpick comments (5)
docs/source/torch/auto_deploy/advanced/workflow.md (3)

7-8: Add a language identifier to the fenced code block

markdownlint (MD040) flags the fenced code block because it lacks a language spec.
Specify python so syntax highlighting works and the docs build does not warn.

-```
+```python

Also applies to: 28-28


12-26: Replace angle-bracket placeholders with literal examples or back-tick them

The placeholders (<HF_MODEL_CARD_OR_DIR>, <DESIRED_WORLD_SIZE>, etc.) render as HTML tags and disappear in the generated docs, confusing readers. Either:

  1. Wrap them in back-ticks, or
  2. Provide concrete example values (e.g., "meta-llama/Llama-2-7b-chat-hf").
-    model=<HF_MODEL_CARD_OR_DIR>,
+    model="<HF_MODEL_CARD_OR_DIR>",

30-32: Use proper Sphinx cross-refs for internal API links

LLM and AutoDeployConfig are referenced as plain code. Converting to
:py:class:tensorrt_llm._torch.auto_deploy.llm.LLM and `:py:class:`tensorrt_llm._torch.auto_deploy.llm_args.AutoDeployConfig enables
inter-doc linking and prevents broken anchors.

docs/source/torch/auto_deploy/support_matrix.md (1)

59-60: Broken external link formatting

flashinfer link includes a trailing “.git” which needlessly appears in the
docs. Use the bare repo URL or a short link:

-[flashinfer](https://github.com/flashinfer-ai/flashinfer.git)
+[flashinfer](https://github.com/flashinfer-ai/flashinfer)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)

6-16: Prefer asterisk emphasis to comply with MD049

Several underscores (_exclusively_, _ignored_) trigger MD049. Switch to
*exclusively*, *ignored* for lint-clean docs.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed9f75a and 5361c63.

📒 Files selected for processing (7)
  • docs/source/torch.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/example_run.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/expert_configurations.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/logging.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/workflow.md (1 hunks)
  • docs/source/torch/auto_deploy/auto-deploy.md (1 hunks)
  • docs/source/torch/auto_deploy/support_matrix.md (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • docs/source/torch/auto_deploy/advanced/logging.md
  • docs/source/torch/auto_deploy/advanced/example_run.md
  • docs/source/torch/auto_deploy/auto-deploy.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/source/torch.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • docs/source/torch/auto_deploy/support_matrix.md
  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
  • docs/source/torch/auto_deploy/advanced/workflow.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md

7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)

docs/source/torch/auto_deploy/advanced/workflow.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

@Fridah-nv Fridah-nv force-pushed the user/fridah/ad-doc branch from 5361c63 to 29534cf Compare August 5, 2025 20:19
@Fridah-nv Fridah-nv requested a review from a team as a code owner August 5, 2025 20:19
@Fridah-nv Fridah-nv changed the title doc:[AutoDeploy]: Move AutoDeploy README.md to torch docs [None][doc]: Move AutoDeploy README.md to torch docs Aug 5, 2025
@Fridah-nv Fridah-nv changed the title [None][doc]: Move AutoDeploy README.md to torch docs [None][doc] Move AutoDeploy README.md to torch docs Aug 5, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
docs/source/torch/auto_deploy/advanced/workflow.md (1)

7-8: Specify language for fenced code block

markdownlint (MD040) is flagging the opening fence because no language is given. Add python so syntax highlighting and tooling work correctly.

-```
+```python
docs/source/torch/auto_deploy/advanced/expert_configurations.md (2)

21-21: Fix typo: PyDanticPydantic

The library’s correct name is “Pydantic” (lower-case “d”). Keeping the canonical spelling avoids search/SEO issues and confusion.

-...powered by PyDantic Settings and OmegaConf.
+...powered by Pydantic Settings and OmegaConf.

6-17: Unify emphasis style to asterisks to satisfy MD049

Several list items mix asterisks (*) and underscores (_) for emphasis. markdownlint (MD049) prefers a single style per document. Consider switching the underscores to asterisks for consistency—e.g.:

-_exclusively_
+*exclusively*

-_ignored_
+*ignored*

Not functional, but it removes linter warnings and keeps formatting uniform.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5361c63 and 29534cf.

📒 Files selected for processing (7)
  • docs/source/torch.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/example_run.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/expert_configurations.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/logging.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/workflow.md (1 hunks)
  • docs/source/torch/auto_deploy/auto-deploy.md (1 hunks)
  • docs/source/torch/auto_deploy/support_matrix.md (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • docs/source/torch/auto_deploy/advanced/logging.md
  • docs/source/torch/auto_deploy/auto-deploy.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • docs/source/torch.md
  • docs/source/torch/auto_deploy/advanced/example_run.md
  • docs/source/torch/auto_deploy/support_matrix.md
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
  • docs/source/torch/auto_deploy/advanced/workflow.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
  • docs/source/torch/auto_deploy/advanced/workflow.md
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.875Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md

7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)

docs/source/torch/auto_deploy/advanced/workflow.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@Fridah-nv Fridah-nv force-pushed the user/fridah/ad-doc branch from 29534cf to 6ef577a Compare August 5, 2025 20:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)

21-21: Typo: “PyDantic” → “Pydantic”
The library’s canonical casing is “Pydantic”. Re-introducing the wrong variant will confuse readers and break automated linkification.

- …parser powered by PyDantic Settings and OmegaConf.
+ …parser powered by Pydantic Settings and OmegaConf.
🧹 Nitpick comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)

6-13: Style-guide violation: use asterisks for emphasis (MD049)
The project’s markdown-lint rules expect *italic*, not _italic_. Two underscores trigger the linter on Lines 7 and 12.

-  _exclusively_ exposed in the `AutoDeployConfig`
+  *exclusively* exposed in the `AutoDeployConfig`

-  object are overlapping, duplicated, and/or _ignored_ in AutoDeploy
+  object are overlapping, duplicated, and/or *ignored* in AutoDeploy
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29534cf and 6ef577a.

📒 Files selected for processing (7)
  • docs/source/torch.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/example_run.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/expert_configurations.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/logging.md (1 hunks)
  • docs/source/torch/auto_deploy/advanced/workflow.md (1 hunks)
  • docs/source/torch/auto_deploy/auto-deploy.md (1 hunks)
  • docs/source/torch/auto_deploy/support_matrix.md (1 hunks)
✅ Files skipped from review due to trivial changes (4)
  • docs/source/torch/auto_deploy/advanced/logging.md
  • docs/source/torch/auto_deploy/advanced/workflow.md
  • docs/source/torch/auto_deploy/auto-deploy.md
  • docs/source/torch/auto_deploy/support_matrix.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/source/torch.md
  • docs/source/torch/auto_deploy/advanced/example_run.md
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.875Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Applied to files:

  • docs/source/torch/auto_deploy/advanced/expert_configurations.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md

7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


7-7: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


12-12: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)

🔇 Additional comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)

40-43: Inconsistent compile_backend examples may confuse users
The CLI sample sets --args.compile-backend=torch-opt, while the YAML examples use compile_backend: torch-compile and torch-opt. Please confirm both values are valid and, if so, briefly mention the accepted enum to avoid readers assuming one is a typo.

Also applies to: 69-70, 100-101

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md (2)

43-47: Inconsistent default for compile_backend may confuse users

The YAML example sets compile_backend: torch-opt, but the reference table (Line 73) indicates a default of torch-compile. Either the example or the table default should be aligned so users are not left uncertain about which backend is recommended out-of-the-box.


65-65: Minor terminology polish: “Multi-GPU”

Multi-gpu should be capitalised as “Multi-GPU” to follow common GPU acronym styling used throughout the docs.

docs/source/torch/auto_deploy/auto-deploy.md (3)

3-6: Redundant “Note:” label inside the {note} admonition

The MyST admonition already renders a “Note” header; the explicit Note: inside the block is superfluous. Dropping it avoids a duplicated label.

 ```{note}
-Note:
 This project is in active development … reliability.

---

`41-41`: **Typo: “LLama” → “Llama”**

Capitalize only the first “L” to match the official model name.

---

`52-56`: **Section title “Support Matrix” precedes workflow text**

Immediately after the “## Support Matrix” heading, the content describes the workflow rather than the matrix and then links to a separate page. Consider either moving the workflow paragraph above the heading or renaming this heading to “Workflow Overview” for clarity.

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 6ef577a594d88f2041227fdf7b92dd3c6710d6c1 and e5d81ee921d8ef3db4961b763af96039494da404.

</details>

<details>
<summary>⛔ Files ignored due to path filters (1)</summary>

* `docs/source/torch/auto_deploy/ad_overview.png` is excluded by `!**/*.png`

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md` (1 hunks)
* `docs/source/torch/auto_deploy/auto-deploy.md` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (3)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.


Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.


</details>
<details>
<summary>📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...</summary>

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.


**Applied to files:**
- `docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md`
- `docs/source/torch/auto_deploy/auto-deploy.md`

</details>
<details>
<summary>📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...</summary>

Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.


**Applied to files:**
- `docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md`
- `docs/source/torch/auto_deploy/auto-deploy.md`

</details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>docs/source/torch/auto_deploy/auto-deploy.md</summary>

19-19: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary>

* GitHub Check: Pre-commit Check

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link
Collaborator

@suyoggupta suyoggupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @Fridah-nv for helping with this

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docs/source/torch/auto_deploy/auto-deploy.md (1)

8-8: Replace raw <h4> with Markdown heading to preserve TOC
Raw HTML headings break Sphinx-generated navigation and violate MD001. Switch to a proper “## …” Markdown heading.

🧹 Nitpick comments (3)
docs/source/torch/auto_deploy/auto-deploy.md (3)

19-19: Heading hierarchy skips a level (MD001)
### **Key Features:** jumps from H1 to H3. Change this to ## Key Features (or introduce an intermediate H2 before any H3s).


12-15: Use Markdown image/figure syntax instead of raw HTML
The <div>/<img> block may not render width/alt text consistently in Sphinx and disables automatic figure numbering. Prefer:

```{figure} ../../media/ad_overview.png
:width: 70%
:align: center
AutoDeploy overview and relation with TensorRT-LLM’s LLM API

---

`41-41`: **Typo: “LLama” → “Llama”**  
Maintain correct model name casing in examples.

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 45664956fda946decffcd366c7f77bb6c098a75c and 1e03c7d05e1529ace47cf09af524e9c33a41a8bc.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `docs/source/torch/auto_deploy/auto-deploy.md` (1 hunks)
* `docs/source/torch/auto_deploy/support_matrix.md` (1 hunks)

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (1)</summary>

* docs/source/torch/auto_deploy/support_matrix.md

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (4)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.


Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.


Learnt from: galagam
PR: #6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.


</details>
<details>
<summary>📚 Learning: 2025-08-06T13:58:07.506Z</summary>

Learnt from: galagam
PR: #6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.


**Applied to files:**
- `docs/source/torch/auto_deploy/auto-deploy.md`

</details>
<details>
<summary>📚 Learning: 2025-08-01T15:14:45.673Z</summary>

Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.


**Applied to files:**
- `docs/source/torch/auto_deploy/auto-deploy.md`

</details>
<details>
<summary>📚 Learning: 2025-07-28T17:06:08.621Z</summary>

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.


**Applied to files:**
- `docs/source/torch/auto_deploy/auto-deploy.md`

</details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>docs/source/torch/auto_deploy/auto-deploy.md</summary>

19-19: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary>

* GitHub Check: Pre-commit Check

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@Fridah-nv Fridah-nv force-pushed the user/fridah/ad-doc branch from 1e03c7d to 0c13211 Compare August 7, 2025 21:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
docs/source/torch/auto_deploy/auto-deploy.md (2)

3-6: Remove redundant “Note:” inside the {note} admonition

The {note} directive already renders a “Note” label; repeating it clutters the output.

-```{note}
-Note:
+```{note}

12-15: Consider using a MyST figure directive instead of raw HTML for the image

Pure HTML blocks are fine but prevent automatic width/alt validation and can confuse translators.
A MyST equivalent keeps docs pure-Markdown:

```{figure} ../../media/ad_overview.png
:alt: AutoDeploy integration with LLM API
:width: 70%
AutoDeploy overview and relation with TensorRT-LLM's LLM API

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 1e03c7d05e1529ace47cf09af524e9c33a41a8bc and 0c132115814859b4fce873deb49dee5b95c944c3.

</details>

<details>
<summary>⛔ Files ignored due to path filters (1)</summary>

* `docs/source/media/ad_overview.png` is excluded by `!**/*.png`

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `docs/source/torch/auto_deploy/auto-deploy.md` (1 hunks)
* `docs/source/torch/auto_deploy/support_matrix.md` (1 hunks)

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* docs/source/torch/auto_deploy/support_matrix.md

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (4)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.


Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.


Learnt from: galagam
PR: #6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.


</details>
<details>
<summary>📚 Learning: 2025-08-06T13:58:07.506Z</summary>

Learnt from: galagam
PR: #6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.


**Applied to files:**
- `docs/source/torch/auto_deploy/auto-deploy.md`

</details>
<details>
<summary>📚 Learning: 2025-08-01T15:14:45.673Z</summary>

Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.


**Applied to files:**
- `docs/source/torch/auto_deploy/auto-deploy.md`

</details>
<details>
<summary>📚 Learning: 2025-07-28T17:06:08.621Z</summary>

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.


**Applied to files:**
- `docs/source/torch/auto_deploy/auto-deploy.md`

</details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>docs/source/torch/auto_deploy/auto-deploy.md</summary>

19-19: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (1)</summary><blockquote>

<details>
<summary>docs/source/torch/auto_deploy/auto-deploy.md (1)</summary>

`8-8`: **Replace raw HTML heading with Markdown to restore TOC generation and fix MD001**

Raw `<h4>` headings bypass Sphinx/Myst-Parser’s automatic TOC & anchor generation and break the heading-increment rule reported by markdownlint (MD001).  
Convert it to an H2 so the hierarchy becomes:

H1 → H2 → H3 …

```diff
-<h4> Seamless Model Deployment from PyTorch to TRT-LLM</h4>
+## Seamless Model Deployment from PyTorch to TRT-LLM

[ suggest_essential_refactor ]

@Fridah-nv
Copy link
Collaborator Author

/bot skip --comment "Doc change only"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
docs/source/torch/auto_deploy/auto-deploy.md (1)

8-8: Replace raw HTML heading with Markdown H2 to restore Sphinx TOC/navigation

Raw <h4> breaks heading structure and Sphinx link generation. Use a Markdown heading instead.

-<h4> Seamless Model Deployment from PyTorch to TRT-LLM</h4>
+## Seamless Model Deployment from PyTorch to TRT-LLM
🧹 Nitpick comments (6)
docs/source/torch/auto_deploy/auto-deploy.md (6)

3-6: Simplify note admonition content

The “Note:” label inside a {note} block is redundant. Keep the content only.

 ```{note}
-Note:
 This project is in active development and is currently in a prototype stage. The code is experimental, subject to change, and may include backward-incompatible updates. While we strive for correctness, we provide no guarantees regarding functionality, stability, or reliability.

---

`10-10`: **Use correct product names: “Hugging Face Transformers”**

Normalize terminology and casing.


```diff
-AutoDeploy is a prototype designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models like those from HuggingFace transformers library, to TensorRT-LLM.
+AutoDeploy is a prototype designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models such as those from the Hugging Face Transformers library, to TensorRT-LLM.

21-21: Use proper CUDA terminology

“CudaGraph” → “CUDA Graphs.”

-- **Optimized Inference:** Built-in transformations for sharding, quantization, KV-cache integration, MHA fusion, and CudaGraph optimization.
+- **Optimized Inference:** Built-in transformations for sharding, quantization, KV-cache integration, MHA fusion, and CUDA Graphs optimization.

39-41: Fix typos and comma splice; standardize naming and phrasing

“LLama” → “Llama”, split run-on sentence, “entrypoint” → “entry point”, “Huggingface” → “Hugging Face”, “AutoDeploy” naming.

-You are ready to run an in-framework LLama Demo now.
+You're ready to run an in-framework Llama demo now.
 
-The general entrypoint to run the auto-deploy demo is the `build_and_run_ad.py` script, Checkpoints are loaded directly from Huggingface (HF) or a local HF-like directory:
+The general entry point for the AutoDeploy demo is the `build_and_run_ad.py` script. Checkpoints are loaded directly from Hugging Face (HF) or a local HF-like directory:

58-63: Minor title/style consistency in Advanced Usage list

Keep titles concise and consistent.

-- [Example Run Script](./advanced/example_run.md)
-- [Logging Level](./advanced/logging.md)
-- [Incorporating AutoDeploy into Your Own Workflow](./advanced/workflow.md)
+- [Example run script](./advanced/example_run.md)
+- [Logging levels](./advanced/logging.md)
+- [Integrating AutoDeploy into your workflow](./advanced/workflow.md)
 - [Expert Configurations](./advanced/expert_configurations.md)
-- [Performance benchmarking](./advanced/benchmarking_with_trtllm_bench.md)
+- [Performance benchmarking](./advanced/benchmarking_with_trtllm_bench.md)

80-82: Fix “GitHub” casing and comma splice

Use correct brand casing and avoid comma splices for clarity.

-To track development progress and contribute, visit our [Github Project Board](https://github.com/orgs/NVIDIA/projects/83/views/13).
-We welcome community contributions, see `examples/auto_deploy/CONTRIBUTING.md` for guidelines.
+To track development progress and contribute, visit our [GitHub Project Board](https://github.com/orgs/NVIDIA/projects/83/views/13).
+We welcome community contributions. See `examples/auto_deploy/CONTRIBUTING.md` for guidelines.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c13211 and 55157c5.

📒 Files selected for processing (1)
  • docs/source/torch/auto_deploy/auto-deploy.md (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • docs/source/torch/auto_deploy/auto-deploy.md
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • docs/source/torch/auto_deploy/auto-deploy.md
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • docs/source/torch/auto_deploy/auto-deploy.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/auto-deploy.md

17-17: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14635 [ skip ] triggered by Bot

@Fridah-nv Fridah-nv enabled auto-merge (squash) August 8, 2025 17:58
@Fridah-nv Fridah-nv disabled auto-merge August 8, 2025 17:59
@nv-guomingz nv-guomingz requested a review from chenopis August 8, 2025 18:00
@tensorrt-cicd
Copy link
Collaborator

PR_Github #14635 [ skip ] completed with state SUCCESS
Skipping testing for commit 55157c5

Copy link
Collaborator

@chenopis chenopis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some suggestions, but LGTM overall.

@Fridah-nv
Copy link
Collaborator Author

Fridah-nv commented Aug 8, 2025

Added some suggestions, but LGTM overall.

Thank you @chenopis for the valuable feedback! The doc looks better indeed.

Fridah-nv and others added 5 commits August 8, 2025 15:43
Signed-off-by: Frida Hou <[email protected]>

move autodeploy doc into torch, update links

Signed-off-by: Frida Hou <[email protected]>

update contents

Signed-off-by: Frida Hou <[email protected]>

replace hyperlink with modular path

Signed-off-by: Frida Hou <[email protected]>

minor

Signed-off-by: Frida Hou <[email protected]>
Signed-off-by: Frida Hou <[email protected]>

minor fix

Signed-off-by: Frida Hou <[email protected]>

minor fix

Signed-off-by: Frida Hou <[email protected]>
Signed-off-by: Frida Hou <[email protected]>
Signed-off-by: Frida Hou <[email protected]>
@Fridah-nv Fridah-nv force-pushed the user/fridah/ad-doc branch from ac07367 to 3ad4cfe Compare August 8, 2025 22:43
@Fridah-nv
Copy link
Collaborator Author

/bot skip --comment "Doc change only"

@Fridah-nv Fridah-nv enabled auto-merge (squash) August 8, 2025 22:47
@tensorrt-cicd
Copy link
Collaborator

PR_Github #14653 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14653 [ skip ] completed with state SUCCESS
Skipping testing for commit 3ad4cfe

@Fridah-nv Fridah-nv merged commit cc0f4c8 into NVIDIA:main Aug 8, 2025
4 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in AutoDeploy Board Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants