-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[None][doc] Move AutoDeploy README.md to torch docs #6528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Caution Review failedFailed to post review comments. Configuration used: .coderabbit.yaml 📒 Files selected for processing (8)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (4)
🧰 Additional context used🧠 Learnings (5)📓 Common learnings
📚 Learning: 2025-08-01T15:14:45.673Z
Applied to files:
📚 Learning: 2025-07-28T17:06:08.621Z
Applied to files:
📚 Learning: 2025-08-06T13:58:07.506Z
Applied to files:
📚 Learning: 2025-07-22T09:22:14.726Z
Applied to files:
🪛 markdownlint-cli2 (0.17.2)docs/source/torch/auto_deploy/advanced/workflow.md7-7: Fenced code blocks should have a language specified (MD040, fenced-code-language) docs/source/torch/auto_deploy/auto-deploy.md7-7: Heading levels should only increment by one level at a time (MD001, heading-increment) 16-16: Trailing punctuation in heading (MD026, no-trailing-punctuation) 🔇 Additional comments (1)
📝 WalkthroughWalkthroughThis update introduces comprehensive documentation for the new experimental "AutoDeploy" feature, which enables seamless deployment of PyTorch models to TRT-LLM. The changes add multiple markdown files covering installation, usage, advanced configuration, logging, workflow integration, benchmarking, and a detailed support matrix. No code or public API changes are included. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Documentation
User->>Documentation: Reads AutoDeploy overview
User->>Documentation: Consults support matrix for model/backend compatibility
User->>Documentation: Reviews advanced configuration and example usage
User->>Documentation: Learns about logging, workflow integration, and benchmarking
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Suggested labels
Suggested reviewers
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (8)
docs/source/auto_deploy/advanced/logging.md (2)
3-4
: Minor wording / punctuation nit
The semicolon after “verbosity” reads oddly. A colon works better:-Use the following env variable to specify the logging level of our built-in logger ordered by -decreasing verbosity; +Use the following environment variable to specify the logging level of the built-in logger, ordered by +decreasing verbosity:
6-12
: Reduce repetition – show level placeholder once
Five identical assignments suggest users need five exports; actually only one is needed. Consider:-```bash -AUTO_DEPLOY_LOG_LEVEL=DEBUG -AUTO_DEPLOY_LOG_LEVEL=INFO -AUTO_DEPLOY_LOG_LEVEL=WARNING -AUTO_DEPLOY_LOG_LEVEL=ERROR -AUTO_DEPLOY_LOG_LEVEL=INTERNAL_ERROR -``` +```bash +# Choose one of: DEBUG | INFO | WARNING | ERROR | INTERNAL_ERROR +export AUTO_DEPLOY_LOG_LEVEL=INFO +```This keeps the doc concise and avoids the impression multiple exports are required.
docs/source/auto_deploy/advanced/mixed_precision_quantization.md (1)
14-17
: Ensure CLI flag syntax consistency (--arg=value
).Most examples in the AutoDeploy docs use the
--key=value
style. Here a space is used, which can break parsing for many CLI frameworks.-python build_and_run_ad.py --model "<MODELOPT_CKPT_PATH>" --args.world-size 1 +python build_and_run_ad.py --model "<MODELOPT_CKPT_PATH>" --args.world-size=1docs/source/auto-deploy.md (2)
8-10
: Replace raw HTML header with a Markdown heading.Sphinx + MyST renders standard Markdown headings more reliably than embedded HTML.
-<h4> Seamless Model Deployment from PyTorch to TRT-LLM</h4> +#### Seamless Model Deployment from PyTorch to TRT-LLM
41-43
: Fix mixed-case “LLama” spelling.The line uses “LLama”, whereas the project consistently uses “LLaMA”.
-You are ready to run an in-framework LLama Demo now. +You are ready to run an in-framework LLaMA demo now.docs/source/auto_deploy/advanced/expert_configurations.md (1)
6-16
: Unify emphasis style to satisfy markdown-lint (MD049).The file mixes underscores and asterisks for emphasis; the linter expects asterisks.
-_exclusively_ +*exclusively*(Apply similarly to other occurrences.)
docs/source/auto_deploy/advanced/model_eval.md (2)
3-3
: Add direct reference & correct capitalization for LM Evaluation Harness.Consider linking to the official repo (
https://github.com/EleutherAI/lm-evaluation-harness
) and capitalising the tool’s name for consistency with other docs.
6-7
: Clarify the “model is defined the same as above” cross-reference.This file is self-contained; readers landing here directly won’t know which “above” section you mean. Either include the earlier snippet inline or link explicitly to the relevant doc section.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
docs/source/auto-deploy.md
(1 hunks)docs/source/auto_deploy/advanced/example_run.md
(1 hunks)docs/source/auto_deploy/advanced/expert_configurations.md
(1 hunks)docs/source/auto_deploy/advanced/logging.md
(1 hunks)docs/source/auto_deploy/advanced/mixed_precision_quantization.md
(1 hunks)docs/source/auto_deploy/advanced/model_eval.md
(1 hunks)docs/source/auto_deploy/advanced/workflow.md
(1 hunks)docs/source/auto_deploy/support_matrix.md
(1 hunks)docs/source/index.rst
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/auto_deploy/advanced/workflow.md
docs/source/auto_deploy/support_matrix.md
docs/source/auto-deploy.md
docs/source/auto_deploy/advanced/expert_configurations.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/auto_deploy/advanced/workflow.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/source/auto_deploy/advanced/expert_configurations.md
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
🔇 Additional comments (3)
docs/source/auto_deploy/advanced/workflow.md (1)
30-32
: Hard-coded relative links will not resolve in published docs
../../tensorrt_llm/_torch/auto_deploy/llm.py
points outsidedocs/
and Sphinx cannot copy that file.Use one of:
• A GitHub permalink, e.g.https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/auto_deploy/llm.py
• Or, if API docs are generated, cross-link with:py:class:
tensorrt_llm._torch.auto_deploy.LLM``.Broken links will fail the linkcheck job.
docs/source/auto_deploy/advanced/example_run.md (1)
26-29
: Spelling / consistency of runtime value
--args.runtime "demollm"
looks suspicious – elsewhere in the project the engine is spelleddemo_llm
ordemollm
? Please verify the exact token; a typo will cause argument-parsing errors at runtime.docs/source/auto_deploy/advanced/model_eval.md (1)
6-11
: Quote--model_args
to avoid shell-level splitting.Without quotes, Bash treats the comma as an argument separator, so only
model=meta-llama/Meta-Llama-3.1-8B-Instruct
is passed to--model_args
andworld_size=2
becomes a standalone positional arg, causing the script to fail.-python lm_eval_ad.py \ ---model autodeploy --model_args model=meta-llama/Meta-Llama-3.1-8B-Instruct,world_size=2 --tasks mmlu +python lm_eval_ad.py \ + --model autodeploy \ + --model_args "model=meta-llama/Meta-Llama-3.1-8B-Instruct,world_size=2" \ + --tasks mmlu⛔ Skipped due to learnings
Learnt from: moraxu PR: NVIDIA/TensorRT-LLM#6303 File: tests/integration/test_lists/qa/examples_test_list.txt:494-494 Timestamp: 2025-07-28T17:06:08.621Z Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)
6-12
: Broken links & inconsistent emphasis; switch to Sphinx cross-refs
AutoDeployConfig
links back tollm_args.py
rather than the actual config module._exclusively_
/_ignored_
use underscore emphasis while the rest of the doc uses asterisks – flagged by MD049.- Direct source-file links are fragile in rendered docs; prefer
:py:class:
or an absolute GitHub URL.- _exclusively_ exposed in the [`AutoDeployConfig` class](../../../../../tensorrt_llm/_torch/auto_deploy/llm_args.py). + *exclusively* exposed in the :py:class:`tensorrt_llm._torch.auto_deploy.auto_deploy_config.AutoDeployConfig`. ... - object are overlapping, duplicated, and/or _ignored_ in AutoDeploy + object are overlapping, duplicated, and/or *ignored* in AutoDeploy
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/source/torch.md
(1 hunks)docs/source/torch/auto_deploy/advanced/example_run.md
(1 hunks)docs/source/torch/auto_deploy/advanced/expert_configurations.md
(1 hunks)docs/source/torch/auto_deploy/advanced/logging.md
(1 hunks)docs/source/torch/auto_deploy/advanced/workflow.md
(1 hunks)docs/source/torch/auto_deploy/auto-deploy.md
(1 hunks)docs/source/torch/auto_deploy/support_matrix.md
(1 hunks)
✅ Files skipped from review due to trivial changes (6)
- docs/source/torch/auto_deploy/advanced/logging.md
- docs/source/torch.md
- docs/source/torch/auto_deploy/advanced/example_run.md
- docs/source/torch/auto_deploy/advanced/workflow.md
- docs/source/torch/auto_deploy/auto-deploy.md
- docs/source/torch/auto_deploy/support_matrix.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/torch/auto_deploy/advanced/expert_configurations.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
🔇 Additional comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)
170-176
: Confirm default config path is correct
cat tensorrt_llm/_torch/auto_deploy/config/default.yaml
assumes the file lives inside_torch/auto_deploy/config/
.
In the current tree the directory is usually.../auto_deploy/configs/
(plural). Please verify the exact location or update the path; otherwise the example will mislead users.
docs/source/torch/auto_deploy/advanced/expert_configurations.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)
21-27
: Fix “PyDantic” typo and brokenllm_args
link
Same issue flagged previously; still unfixed.-… through a flexible argument parser powered by PyDantic Settings and OmegaConf. +… through a flexible argument parser powered by Pydantic Settings and OmegaConf. -… in `tensorrt_llm/_torch/auto_deploy/llm_args`. +… in `tensorrt_llm/_torch/auto_deploy/llm_args.py`.
🧹 Nitpick comments (1)
docs/source/torch/auto_deploy/advanced/workflow.md (1)
7-28
: Add a language spec to the fenced code blockMissing a language identifier violates MD040 and prevents syntax highlighting.
-``` +```python from tensorrt_llm._torch.auto_deploy import LLM ... llm = LLM( @@ ) -``` +``` # closing fence
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/source/torch/auto_deploy/advanced/example_run.md
(1 hunks)docs/source/torch/auto_deploy/advanced/expert_configurations.md
(1 hunks)docs/source/torch/auto_deploy/advanced/workflow.md
(1 hunks)docs/source/torch/auto_deploy/auto-deploy.md
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- docs/source/torch/auto_deploy/advanced/example_run.md
- docs/source/torch/auto_deploy/auto-deploy.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/torch/auto_deploy/advanced/expert_configurations.md
docs/source/torch/auto_deploy/advanced/workflow.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
docs/source/torch/auto_deploy/advanced/workflow.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
ed9f75a
to
5361c63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (2)
21-21
: “PyDantic” → “Pydantic” (typo still present)The same typo was flagged in a previous review but remains unfixed.
-argument parser powered by PyDantic Settings +argument parser powered by Pydantic Settings
21-21
: Fix 404-prone link – missing.py
extensionThe
LlmArgs
link ends without.py
, producing a 404.-.../tensorrt_llm/_torch/auto_deploy/llm_args) +.../tensorrt_llm/_torch/auto_deploy/llm_args.py)
🧹 Nitpick comments (5)
docs/source/torch/auto_deploy/advanced/workflow.md (3)
7-8
: Add a language identifier to the fenced code block
markdownlint
(MD040) flags the fenced code block because it lacks a language spec.
Specifypython
so syntax highlighting works and the docs build does not warn.-``` +```pythonAlso applies to: 28-28
12-26
: Replace angle-bracket placeholders with literal examples or back-tick themThe placeholders (
<HF_MODEL_CARD_OR_DIR>
,<DESIRED_WORLD_SIZE>
, etc.) render as HTML tags and disappear in the generated docs, confusing readers. Either:
- Wrap them in back-ticks, or
- Provide concrete example values (e.g.,
"meta-llama/Llama-2-7b-chat-hf"
).- model=<HF_MODEL_CARD_OR_DIR>, + model="<HF_MODEL_CARD_OR_DIR>",
30-32
: Use proper Sphinx cross-refs for internal API links
LLM
andAutoDeployConfig
are referenced as plain code. Converting to
:py:class:
tensorrt_llm._torch.auto_deploy.llm.LLMand `:py:class:`tensorrt_llm._torch.auto_deploy.llm_args.AutoDeployConfig
enables
inter-doc linking and prevents broken anchors.docs/source/torch/auto_deploy/support_matrix.md (1)
59-60
: Broken external link formatting
flashinfer
link includes a trailing “.git” which needlessly appears in the
docs. Use the bare repo URL or a short link:-[flashinfer](https://github.com/flashinfer-ai/flashinfer.git) +[flashinfer](https://github.com/flashinfer-ai/flashinfer)docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)
6-16
: Prefer asterisk emphasis to comply with MD049Several underscores (
_exclusively_
,_ignored_
) trigger MD049. Switch to
*exclusively*
,*ignored*
for lint-clean docs.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/source/torch.md
(1 hunks)docs/source/torch/auto_deploy/advanced/example_run.md
(1 hunks)docs/source/torch/auto_deploy/advanced/expert_configurations.md
(1 hunks)docs/source/torch/auto_deploy/advanced/logging.md
(1 hunks)docs/source/torch/auto_deploy/advanced/workflow.md
(1 hunks)docs/source/torch/auto_deploy/auto-deploy.md
(1 hunks)docs/source/torch/auto_deploy/support_matrix.md
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- docs/source/torch/auto_deploy/advanced/logging.md
- docs/source/torch/auto_deploy/advanced/example_run.md
- docs/source/torch/auto_deploy/auto-deploy.md
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/source/torch.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/torch/auto_deploy/support_matrix.md
docs/source/torch/auto_deploy/advanced/expert_configurations.md
docs/source/torch/auto_deploy/advanced/workflow.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
docs/source/torch/auto_deploy/advanced/workflow.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/source/torch/auto_deploy/advanced/expert_configurations.md
Outdated
Show resolved
Hide resolved
5361c63
to
29534cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/source/torch/auto_deploy/advanced/workflow.md (1)
7-8
: Specify language for fenced code block
markdownlint
(MD040) is flagging the opening fence because no language is given. Addpython
so syntax highlighting and tooling work correctly.-``` +```pythondocs/source/torch/auto_deploy/advanced/expert_configurations.md (2)
21-21
: Fix typo:PyDantic
→Pydantic
The library’s correct name is “Pydantic” (lower-case “d”). Keeping the canonical spelling avoids search/SEO issues and confusion.
-...powered by PyDantic Settings and OmegaConf. +...powered by Pydantic Settings and OmegaConf.
6-17
: Unify emphasis style to asterisks to satisfy MD049Several list items mix asterisks (
*
) and underscores (_
) for emphasis.markdownlint
(MD049) prefers a single style per document. Consider switching the underscores to asterisks for consistency—e.g.:-_exclusively_ +*exclusively* -_ignored_ +*ignored*Not functional, but it removes linter warnings and keeps formatting uniform.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/source/torch.md
(1 hunks)docs/source/torch/auto_deploy/advanced/example_run.md
(1 hunks)docs/source/torch/auto_deploy/advanced/expert_configurations.md
(1 hunks)docs/source/torch/auto_deploy/advanced/logging.md
(1 hunks)docs/source/torch/auto_deploy/advanced/workflow.md
(1 hunks)docs/source/torch/auto_deploy/auto-deploy.md
(1 hunks)docs/source/torch/auto_deploy/support_matrix.md
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- docs/source/torch/auto_deploy/advanced/logging.md
- docs/source/torch/auto_deploy/auto-deploy.md
🚧 Files skipped from review as they are similar to previous changes (3)
- docs/source/torch.md
- docs/source/torch/auto_deploy/advanced/example_run.md
- docs/source/torch/auto_deploy/support_matrix.md
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
docs/source/torch/auto_deploy/advanced/expert_configurations.md
docs/source/torch/auto_deploy/advanced/workflow.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/torch/auto_deploy/advanced/expert_configurations.md
docs/source/torch/auto_deploy/advanced/workflow.md
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.875Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
Applied to files:
docs/source/torch/auto_deploy/advanced/expert_configurations.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
docs/source/torch/auto_deploy/advanced/workflow.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
29534cf
to
6ef577a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)
21-21
: Typo: “PyDantic” → “Pydantic”
The library’s canonical casing is “Pydantic”. Re-introducing the wrong variant will confuse readers and break automated linkification.- …parser powered by PyDantic Settings and OmegaConf. + …parser powered by Pydantic Settings and OmegaConf.
🧹 Nitpick comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)
6-13
: Style-guide violation: use asterisks for emphasis (MD049)
The project’s markdown-lint rules expect*italic*
, not_italic_
. Two underscores trigger the linter on Lines 7 and 12.- _exclusively_ exposed in the `AutoDeployConfig` + *exclusively* exposed in the `AutoDeployConfig` - object are overlapping, duplicated, and/or _ignored_ in AutoDeploy + object are overlapping, duplicated, and/or *ignored* in AutoDeploy
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/source/torch.md
(1 hunks)docs/source/torch/auto_deploy/advanced/example_run.md
(1 hunks)docs/source/torch/auto_deploy/advanced/expert_configurations.md
(1 hunks)docs/source/torch/auto_deploy/advanced/logging.md
(1 hunks)docs/source/torch/auto_deploy/advanced/workflow.md
(1 hunks)docs/source/torch/auto_deploy/auto-deploy.md
(1 hunks)docs/source/torch/auto_deploy/support_matrix.md
(1 hunks)
✅ Files skipped from review due to trivial changes (4)
- docs/source/torch/auto_deploy/advanced/logging.md
- docs/source/torch/auto_deploy/advanced/workflow.md
- docs/source/torch/auto_deploy/auto-deploy.md
- docs/source/torch/auto_deploy/support_matrix.md
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/source/torch.md
- docs/source/torch/auto_deploy/advanced/example_run.md
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
docs/source/torch/auto_deploy/advanced/expert_configurations.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/torch/auto_deploy/advanced/expert_configurations.md
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.875Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
Applied to files:
docs/source/torch/auto_deploy/advanced/expert_configurations.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/advanced/expert_configurations.md
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
7-7: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
12-12: Emphasis style
Expected: asterisk; Actual: underscore
(MD049, emphasis-style)
🔇 Additional comments (1)
docs/source/torch/auto_deploy/advanced/expert_configurations.md (1)
40-43
: Inconsistentcompile_backend
examples may confuse users
The CLI sample sets--args.compile-backend=torch-opt
, while the YAML examples usecompile_backend: torch-compile
andtorch-opt
. Please confirm both values are valid and, if so, briefly mention the accepted enum to avoid readers assuming one is a typo.Also applies to: 69-70, 100-101
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md (2)
43-47
: Inconsistent default forcompile_backend
may confuse usersThe YAML example sets
compile_backend: torch-opt
, but the reference table (Line 73) indicates a default oftorch-compile
. Either the example or the table default should be aligned so users are not left uncertain about which backend is recommended out-of-the-box.
65-65
: Minor terminology polish: “Multi-GPU”
Multi-gpu
should be capitalised as “Multi-GPU” to follow common GPU acronym styling used throughout the docs.docs/source/torch/auto_deploy/auto-deploy.md (3)
3-6
: Redundant “Note:” label inside the{note}
admonitionThe MyST admonition already renders a “Note” header; the explicit
Note:
inside the block is superfluous. Dropping it avoids a duplicated label.```{note} -Note: This project is in active development … reliability.
--- `41-41`: **Typo: “LLama” → “Llama”** Capitalize only the first “L” to match the official model name. --- `52-56`: **Section title “Support Matrix” precedes workflow text** Immediately after the “## Support Matrix” heading, the content describes the workflow rather than the matrix and then links to a separate page. Consider either moving the workflow paragraph above the heading or renaming this heading to “Workflow Overview” for clarity. </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used: .coderabbit.yaml** **Review profile: CHILL** **Plan: Pro** <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 6ef577a594d88f2041227fdf7b92dd3c6710d6c1 and e5d81ee921d8ef3db4961b763af96039494da404. </details> <details> <summary>⛔ Files ignored due to path filters (1)</summary> * `docs/source/torch/auto_deploy/ad_overview.png` is excluded by `!**/*.png` </details> <details> <summary>📒 Files selected for processing (2)</summary> * `docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md` (1 hunks) * `docs/source/torch/auto_deploy/auto-deploy.md` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (3)</summary> <details> <summary>📓 Common learnings</summary>
Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.</details> <details> <summary>📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...</summary>
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.**Applied to files:** - `docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md` - `docs/source/torch/auto_deploy/auto-deploy.md` </details> <details> <summary>📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...</summary>
Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.**Applied to files:** - `docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md` - `docs/source/torch/auto_deploy/auto-deploy.md` </details> </details><details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> <details> <summary>docs/source/torch/auto_deploy/auto-deploy.md</summary> 19-19: Heading levels should only increment by one level at a time Expected: h2; Actual: h3 (MD001, heading-increment) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary> * GitHub Check: Pre-commit Check </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @Fridah-nv for helping with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/source/torch/auto_deploy/auto-deploy.md (1)
8-8
: Replace raw<h4>
with Markdown heading to preserve TOC
Raw HTML headings break Sphinx-generated navigation and violate MD001. Switch to a proper “## …” Markdown heading.
🧹 Nitpick comments (3)
docs/source/torch/auto_deploy/auto-deploy.md (3)
19-19
: Heading hierarchy skips a level (MD001)
### **Key Features:**
jumps from H1 to H3. Change this to## Key Features
(or introduce an intermediate H2 before any H3s).
12-15
: Use Markdown image/figure syntax instead of raw HTML
The<div>
/<img>
block may not render width/alt text consistently in Sphinx and disables automatic figure numbering. Prefer:```{figure} ../../media/ad_overview.png :width: 70% :align: center AutoDeploy overview and relation with TensorRT-LLM’s LLM API--- `41-41`: **Typo: “LLama” → “Llama”** Maintain correct model name casing in examples. </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used: .coderabbit.yaml** **Review profile: CHILL** **Plan: Pro** <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 45664956fda946decffcd366c7f77bb6c098a75c and 1e03c7d05e1529ace47cf09af524e9c33a41a8bc. </details> <details> <summary>📒 Files selected for processing (2)</summary> * `docs/source/torch/auto_deploy/auto-deploy.md` (1 hunks) * `docs/source/torch/auto_deploy/support_matrix.md` (1 hunks) </details> <details> <summary>✅ Files skipped from review due to trivial changes (1)</summary> * docs/source/torch/auto_deploy/support_matrix.md </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (4)</summary> <details> <summary>📓 Common learnings</summary>
Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: #6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.</details> <details> <summary>📚 Learning: 2025-08-06T13:58:07.506Z</summary>
Learnt from: galagam
PR: #6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.**Applied to files:** - `docs/source/torch/auto_deploy/auto-deploy.md` </details> <details> <summary>📚 Learning: 2025-08-01T15:14:45.673Z</summary>
Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.**Applied to files:** - `docs/source/torch/auto_deploy/auto-deploy.md` </details> <details> <summary>📚 Learning: 2025-07-28T17:06:08.621Z</summary>
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.**Applied to files:** - `docs/source/torch/auto_deploy/auto-deploy.md` </details> </details><details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> <details> <summary>docs/source/torch/auto_deploy/auto-deploy.md</summary> 19-19: Heading levels should only increment by one level at a time Expected: h2; Actual: h3 (MD001, heading-increment) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary> * GitHub Check: Pre-commit Check </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
1e03c7d
to
0c13211
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/source/torch/auto_deploy/auto-deploy.md (2)
3-6
: Remove redundant “Note:” inside the{note}
admonitionThe
{note}
directive already renders a “Note” label; repeating it clutters the output.-```{note} -Note: +```{note}
12-15
: Consider using a MyST figure directive instead of raw HTML for the imagePure HTML blocks are fine but prevent automatic width/alt validation and can confuse translators.
A MyST equivalent keeps docs pure-Markdown:```{figure} ../../media/ad_overview.png :alt: AutoDeploy integration with LLM API :width: 70% AutoDeploy overview and relation with TensorRT-LLM's LLM API</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used: .coderabbit.yaml** **Review profile: CHILL** **Plan: Pro** <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 1e03c7d05e1529ace47cf09af524e9c33a41a8bc and 0c132115814859b4fce873deb49dee5b95c944c3. </details> <details> <summary>⛔ Files ignored due to path filters (1)</summary> * `docs/source/media/ad_overview.png` is excluded by `!**/*.png` </details> <details> <summary>📒 Files selected for processing (2)</summary> * `docs/source/torch/auto_deploy/auto-deploy.md` (1 hunks) * `docs/source/torch/auto_deploy/support_matrix.md` (1 hunks) </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary> * docs/source/torch/auto_deploy/support_matrix.md </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (4)</summary> <details> <summary>📓 Common learnings</summary>
Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: #6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.</details> <details> <summary>📚 Learning: 2025-08-06T13:58:07.506Z</summary>
Learnt from: galagam
PR: #6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.**Applied to files:** - `docs/source/torch/auto_deploy/auto-deploy.md` </details> <details> <summary>📚 Learning: 2025-08-01T15:14:45.673Z</summary>
Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.**Applied to files:** - `docs/source/torch/auto_deploy/auto-deploy.md` </details> <details> <summary>📚 Learning: 2025-07-28T17:06:08.621Z</summary>
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.**Applied to files:** - `docs/source/torch/auto_deploy/auto-deploy.md` </details> </details><details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> <details> <summary>docs/source/torch/auto_deploy/auto-deploy.md</summary> 19-19: Heading levels should only increment by one level at a time Expected: h2; Actual: h3 (MD001, heading-increment) </details> </details> </details> <details> <summary>🔇 Additional comments (1)</summary><blockquote> <details> <summary>docs/source/torch/auto_deploy/auto-deploy.md (1)</summary> `8-8`: **Replace raw HTML heading with Markdown to restore TOC generation and fix MD001** Raw `<h4>` headings bypass Sphinx/Myst-Parser’s automatic TOC & anchor generation and break the heading-increment rule reported by markdownlint (MD001). Convert it to an H2 so the hierarchy becomes: H1 → H2 → H3 … ```diff -<h4> Seamless Model Deployment from PyTorch to TRT-LLM</h4> +## Seamless Model Deployment from PyTorch to TRT-LLM
[ suggest_essential_refactor ]
/bot skip --comment "Doc change only" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
docs/source/torch/auto_deploy/auto-deploy.md (1)
8-8
: Replace raw HTML heading with Markdown H2 to restore Sphinx TOC/navigationRaw
<h4>
breaks heading structure and Sphinx link generation. Use a Markdown heading instead.-<h4> Seamless Model Deployment from PyTorch to TRT-LLM</h4> +## Seamless Model Deployment from PyTorch to TRT-LLM
🧹 Nitpick comments (6)
docs/source/torch/auto_deploy/auto-deploy.md (6)
3-6
: Simplify note admonition contentThe “Note:” label inside a
{note}
block is redundant. Keep the content only.```{note} -Note: This project is in active development and is currently in a prototype stage. The code is experimental, subject to change, and may include backward-incompatible updates. While we strive for correctness, we provide no guarantees regarding functionality, stability, or reliability.
--- `10-10`: **Use correct product names: “Hugging Face Transformers”** Normalize terminology and casing. ```diff -AutoDeploy is a prototype designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models like those from HuggingFace transformers library, to TensorRT-LLM. +AutoDeploy is a prototype designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models such as those from the Hugging Face Transformers library, to TensorRT-LLM.
21-21
: Use proper CUDA terminology“CudaGraph” → “CUDA Graphs.”
-- **Optimized Inference:** Built-in transformations for sharding, quantization, KV-cache integration, MHA fusion, and CudaGraph optimization. +- **Optimized Inference:** Built-in transformations for sharding, quantization, KV-cache integration, MHA fusion, and CUDA Graphs optimization.
39-41
: Fix typos and comma splice; standardize naming and phrasing“LLama” → “Llama”, split run-on sentence, “entrypoint” → “entry point”, “Huggingface” → “Hugging Face”, “AutoDeploy” naming.
-You are ready to run an in-framework LLama Demo now. +You're ready to run an in-framework Llama demo now. -The general entrypoint to run the auto-deploy demo is the `build_and_run_ad.py` script, Checkpoints are loaded directly from Huggingface (HF) or a local HF-like directory: +The general entry point for the AutoDeploy demo is the `build_and_run_ad.py` script. Checkpoints are loaded directly from Hugging Face (HF) or a local HF-like directory:
58-63
: Minor title/style consistency in Advanced Usage listKeep titles concise and consistent.
-- [Example Run Script](./advanced/example_run.md) -- [Logging Level](./advanced/logging.md) -- [Incorporating AutoDeploy into Your Own Workflow](./advanced/workflow.md) +- [Example run script](./advanced/example_run.md) +- [Logging levels](./advanced/logging.md) +- [Integrating AutoDeploy into your workflow](./advanced/workflow.md) - [Expert Configurations](./advanced/expert_configurations.md) -- [Performance benchmarking](./advanced/benchmarking_with_trtllm_bench.md) +- [Performance benchmarking](./advanced/benchmarking_with_trtllm_bench.md)
80-82
: Fix “GitHub” casing and comma spliceUse correct brand casing and avoid comma splices for clarity.
-To track development progress and contribute, visit our [Github Project Board](https://github.com/orgs/NVIDIA/projects/83/views/13). -We welcome community contributions, see `examples/auto_deploy/CONTRIBUTING.md` for guidelines. +To track development progress and contribute, visit our [GitHub Project Board](https://github.com/orgs/NVIDIA/projects/83/views/13). +We welcome community contributions. See `examples/auto_deploy/CONTRIBUTING.md` for guidelines.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/torch/auto_deploy/auto-deploy.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
docs/source/torch/auto_deploy/auto-deploy.md
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
docs/source/torch/auto_deploy/auto-deploy.md
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/torch/auto_deploy/auto-deploy.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/torch/auto_deploy/auto-deploy.md
17-17: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
PR_Github #14635 [ skip ] triggered by Bot |
PR_Github #14635 [ skip ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some suggestions, but LGTM overall.
Thank you @chenopis for the valuable feedback! The doc looks better indeed. |
Signed-off-by: Frida Hou <[email protected]> move autodeploy doc into torch, update links Signed-off-by: Frida Hou <[email protected]> update contents Signed-off-by: Frida Hou <[email protected]> replace hyperlink with modular path Signed-off-by: Frida Hou <[email protected]> minor Signed-off-by: Frida Hou <[email protected]>
Signed-off-by: Suyog Gupta <[email protected]>
Signed-off-by: Frida Hou <[email protected]> minor fix Signed-off-by: Frida Hou <[email protected]> minor fix Signed-off-by: Frida Hou <[email protected]>
Signed-off-by: Frida Hou <[email protected]>
Signed-off-by: Frida Hou <[email protected]>
ac07367
to
3ad4cfe
Compare
/bot skip --comment "Doc change only" |
PR_Github #14653 [ skip ] triggered by Bot |
PR_Github #14653 [ skip ] completed with state |
Summary by CodeRabbit
trtllm-bench
utility.Description
built the doc and tested locally.
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]
Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id
(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test
(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"
(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log
(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug
(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-list
parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
and the
scripts/test_to_stage_mapping.py
helper.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.