[TRTLLM-7030][fix] Refactor the example doc of dist-serving #6766

Shixiaowei02 · 2025-08-09T09:42:22Z

Summary by CodeRabbit

New Features
- Expanded and reorganized documentation for disaggregated serving, including detailed SLURM cluster usage instructions, updated configuration examples, and new sample YAML files and batch scripts.
Bug Fixes
- Improved validation for context-only requests and disaggregated server responses to enhance error handling.
Refactor
- Updated all configuration and code references for cache transceiver backend values from lowercase (e.g., "default", "ucx") to uppercase (e.g., "DEFAULT", "UCX").
Documentation
- Clarified and extended user guides, removed outdated FAQ entries, and updated documentation links and examples for consistency across the project.
Tests
- Adjusted test configurations and assertions to align with updated backend value casing requirements.

Signed-off-by: ShiXiaowei02 <[email protected]>

Signed-off-by: Shixiaowei02 <[email protected]>

coderabbitai · 2025-08-09T09:42:29Z

📝 Walkthrough

Walkthrough

This update standardizes the backend string values for cache transceiver configuration from lowercase (e.g., "default", "ucx") to uppercase (e.g., "DEFAULT", "UCX") across code, configs, and tests. It also updates related documentation, environment variables, and adds validation checks for disaggregated serving, with expanded and reorganized documentation for SLURM and dynamic scaling.

Changes

Cohort / File(s)	Change Summary
Backend String Uppercase Standardization `tensorrt_llm/llmapi/llm_args.py`, `examples/disaggregated/disagg_config.yaml`, `examples/disaggregated/slurm/benchmark/gen_yaml.py`, `tests/integration/defs/accuracy/test_disaggregated_serving.py`, `tests/integration/defs/disaggregated/test_configs/*`, `tests/integration/defs/disaggregated/test_disaggregated_etcd.py`, `tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py`, `tests/unittest/llmapi/test_llm_args.py`	All instances of backend string values in cache transceiver configs and related code/tests changed from lowercase to uppercase (e.g., "default" → "DEFAULT", "ucx" → "UCX"). Type annotations updated accordingly.
Documentation and Usage Examples `benchmarks/cpp/README.md`, `examples/cpp/executor/README.md`, `examples/disaggregated/README.md`, `examples/wide_ep/README.md`, `examples/wide_ep/slurm_scripts/README.md`, `examples/wide_ep/slurm_scripts/submit.sh`, `docs/source/advanced/disaggregated-service.md`, `docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md`	Documentation updated for new environment variables, uppercase backend strings, and reorganized/expanded instructions for disaggregated serving, SLURM usage, and dynamic scaling. Deprecated/removed outdated FAQ and MPI-based instructions.
New Disaggregated SLURM Example Assets `examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml`, `examples/disaggregated/slurm/simple_example/gen_extra-llm-api-config.yaml`, `examples/disaggregated/slurm/simple_example/disagg_config.yaml`, `examples/disaggregated/slurm/simple_example/launch.slurm`	Added new YAML config files and a SLURM batch script for launching disaggregated serving examples, with correct backend casing and explicit configuration for context/generation servers.
Validation and Error Handling Enhancements `tensorrt_llm/_torch/pyexecutor/py_executor.py`, `tensorrt_llm/serve/openai_disagg_server.py`	Added assertion to disallow context-only requests when disaggregation is disabled, and added validation for non-null `ctx_request_id` in context server response.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Proxy/Router
    participant ContextServer
    participant GenerationServer

    Client->>Proxy/Router: Send request (context/generation)
    Proxy/Router->>ContextServer: Forward context request
    ContextServer-->>Proxy/Router: Return response (with disaggregated_params)
    Proxy/Router->>GenerationServer: Forward generation request (with ctx_request_id)
    GenerationServer-->>Proxy/Router: Return generated output
    Proxy/Router-->>Client: Return final response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#6166: Updates backend config in tests, closely related to backend string handling.
NVIDIA/TensorRT-LLM#5234: Introduces and uses cache_transceiver_config for disaggregated serving, directly related to config handling.
NVIDIA/TensorRT-LLM#6623: Adds new backend tests and benchmarks, related to backend configuration and testing.

Suggested labels

Documentation

Suggested reviewers

pcastonguay
kaiyux
nv-guomingz

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Shixiaowei02 · 2025-08-09T09:48:26Z

/bot run

coderabbitai

Actionable comments posted: 15

🔭 Outside diff range comments (4)

examples/disaggregated/slurm/benchmark/gen_yaml.py (1)
1-1: examples/disaggregated/slurm/benchmark/gen_yaml.py – Add NVIDIA SPDX header

Per our Python file conventions, every non-test .py must start with the NVIDIA SPDX copyright header
including the current year (2025). Please insert the following lines at the very top of
examples/disaggregated/slurm/benchmark/gen_yaml.py, before any imports:
+ # SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ # SPDX-License-Identifier: Apache-2.0
  import argparse
This matches the canonical header used throughout the repo and satisfies compliance.
tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1)
1-2: Update copyright year to include 2025.

Per project guidelines, headers should include the current year for .py files.
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
tensorrt_llm/llmapi/llm_args.py (1)
1050-1054: Normalize backend to uppercase for backward compatibility

The Literal now enforces uppercase inputs. Existing external configs using "default"/"ucx" (lowercase) will fail validation. Add a pre-validation normalizer to accept lowercase and coerce to uppercase, keeping the codebase strict and user-facing BC:
 @PybindMirror.mirror_pybind_fields(_CacheTransceiverConfig)
 class CacheTransceiverConfig(StrictBaseModel, PybindMirror):
@@
-    backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field(
+    backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field(
         default=None,
         description=
         "The communication backend type to use for the cache transceiver.")
@@
     def _to_pybind(self):
         return _CacheTransceiverConfig(
             backend=_CacheTransceiverBackendType.from_string(self.backend),
             max_tokens_in_buffer=self.max_tokens_in_buffer)
+
+    @field_validator('backend', mode='before')
+    @classmethod
+    def normalize_backend(cls, v):
+        if v is None:
+            return v
+        if isinstance(v, str):
+            v = v.upper()
+        return v
This preserves compatibility with legacy lowercase configs while enforcing uppercase internally.
examples/disaggregated/README.md (1)
232-236: Typo in metadata config: refersh_interval → refresh_interval

The YAML key is misspelled; users copying this will get a non-working config.

Apply this diff:
- refersh_interval: 10.0
+ refresh_interval: 10.0

🧹 Nitpick comments (26)

tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1)
16-16: Keep quoting style consistent across configs

Uppercasing looks good. For consistency with other files that quote this literal, consider quoting DEFAULT.
-    backend: DEFAULT
+    backend: "DEFAULT"
tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1)
17-17: Quote the backend literal for consistency

Functionally equivalent, but quoting matches other config files and avoids mixed styles.
-    backend: DEFAULT
+    backend: "DEFAULT"
tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (1)
22-22: Unify quoting within the same file

Line 22 is unquoted while line 41 is quoted. Pick one style; recommending quoted to match other configs.
-    backend: DEFAULT
+    backend: "DEFAULT"
@@
-    backend: "DEFAULT"
+    backend: "DEFAULT"
Also applies to: 41-41
tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1)
13-13: Quote backend literals for consistency across test configs

Both entries are correct; quoting them keeps style uniform with other files.
-    backend: DEFAULT
+    backend: "DEFAULT"
@@
-    backend: DEFAULT
+    backend: "DEFAULT"
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (1)
18-18: Quote backend string for consistency

YAML accepts bare words, but quoting string literals avoids ambiguity and keeps style consistent with other example configs.

Apply this diff:
-    backend: DEFAULT
+    backend: "DEFAULT"
And:
-    backend: DEFAULT
+    backend: "DEFAULT"
Also applies to: 33-33
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1)
13-13: Quote backend string for consistency

For consistency with other configs and to avoid YAML ambiguity, quote the string literal.

Apply this diff:
-    backend: DEFAULT
+    backend: "DEFAULT"
And:
-    backend: DEFAULT
+    backend: "DEFAULT"
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1)
11-11: Quote backend string for consistency

Recommend quoting to match style used in example configs.

Apply this diff:
-    backend: DEFAULT
+    backend: "DEFAULT"
And:
-    backend: DEFAULT
+    backend: "DEFAULT"
Also applies to: 19-19
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (1)
17-17: Quote literal for YAML consistency.

To match other string fields (e.g., backend: "pytorch") and avoid accidental keyword interpretation by tooling, quote the value.
-    backend: DEFAULT
+    backend: "DEFAULT"
Also applies to: 26-26
tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (1)
21-21: Prefer quoting string literals.

Minor consistency tweak with other quoted strings in the file.
-    backend: DEFAULT
+    backend: "DEFAULT"
Also applies to: 36-36
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1)
13-13: Nit: quote the string for clarity.

Keeps style consistent and avoids any YAML/tooling edge-cases.
-    backend: DEFAULT
+    backend: "DEFAULT"
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (1)
19-19: Nit: quote the backend value.

Stylistic consistency with other quoted strings.
-    backend: DEFAULT
+    backend: "DEFAULT"
Also applies to: 34-34
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (1)
17-17: Nit: add quotes around string literal.

For consistency with the rest of the YAML config strings.
-    backend: DEFAULT
+    backend: "DEFAULT"
Also applies to: 27-27
tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1)

13-13: Uppercase ‘backend’ values confirmed; ensure parser normalization

All occurrences of backend: default/ucx/mpi/nixl have been updated to uppercase (DEFAULT, UCX, etc.), and a repository-wide scan found no remaining lowercase entries.

Next steps:

Verify that the configuration loader (e.g., in your YAML parsing code) normalizes incoming backend values to uppercase at parse-time to maintain backward compatibility for configs that may still use lowercase.

No changes to the existing YAML files are required.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (1)

12-12: Nit: unify YAML quoting style for string scalars

Here DEFAULT is unquoted, while other configs in this PR quote "DEFAULT". Consider standardizing (prefer quoting) for readability and to avoid YAML edge-cases.

Also applies to: 23-23

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (1)

20-20: Nit: consistent quoting across configs

These lines use quotes, while some other configs in this PR don’t. Consider consistently quoting string scalars repo-wide.

Also applies to: 36-36

tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (1)

19-19: Nit: align YAML scalar quoting style

Recommend choosing a single style (quoted vs unquoted) for string values across all disaggregated configs.

Also applies to: 35-35

tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)

12-12: Nit: standardize quoting

Unify quoting of string scalars across the PR for readability and consistency.

Also applies to: 21-21
benchmarks/cpp/README.md (1)
339-342: Add language to fenced code blocks (bash) for readability and consistency.

Use bash on the opening triple backticks enclosing the export/mpirun snippets so tooling renders it properly.

Example:
-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1
 mpirun -n ${proc} benchmarks/disaggServerBenchmark ...
-```
+```
Also applies to: 347-351
examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml (1)
4-6: Optional: quote backend value for style consistency across configs.

Some configs use quoted strings (e.g., "DEFAULT"). Consider:
-  backend: UCX
+  backend: "UCX"
examples/cpp/executor/README.md (1)
129-134: Specify language on fenced block to satisfy markdownlint (MD040).

Mark this code block as bash.
-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1

 mpirun -n <num_ranks> --allow-run-as-root --oversubscribe ./executorExampleDisaggregated ...
-```
+```
tests/unittest/llmapi/test_llm_args.py (2)
669-672: LGTM on "UCX"; add a small fix for Ruff F405

The value change is correct. Ruff warns F405 here due to star import; add an inline noqa to keep the test concise:
-        config = CacheTransceiverConfig(backend="UCX",
+        config = CacheTransceiverConfig(backend="UCX",  # noqa: F405
                                         max_tokens_in_buffer=1024)
Alternatively, explicitly import the class to satisfy both Ruff and the project’s “maintain namespace” guideline:
import tensorrt_llm.llmapi.llm_args as llm_args
config = llm_args.CacheTransceiverConfig(backend="UCX", max_tokens_in_buffer=1024)
677-677: Mirror the F405 fix on the negative test

Add noqa here as well:
-            CacheTransceiverConfig(backend="UCX", invalid_config="should_fail")
+            CacheTransceiverConfig(backend="UCX", invalid_config="should_fail")  # noqa: F405
If you adopt namespaced imports as above, switch to llm_args.CacheTransceiverConfig to avoid noqa.
tensorrt_llm/serve/openai_disagg_server.py (1)

207-209: Good defensive check; consider tightening validation

The None check is useful. If protocol guarantees ctx_request_id is a positive integer, also validate type/range:

isinstance(ctx_request_id, int)

ctx_request_id > 0

Optionally return a 4xx HTTP error upstream instead of generic ValueError, depending on error handling policy.

I can add a focused unit/integration test that asserts a missing/invalid ctx_request_id raises the expected error. Want a PR follow-up?
examples/disaggregated/README.md (3)
113-115: Add a language to the fenced code block (markdownlint MD040)

Specify the language for the code block.

Apply this diff:
-```
+```bash
 python3 ./clients/disagg_client.py -c disagg_config.yaml -p ./clients/prompts.json -e chat
---

`171-199`: **SLURM srun example: ensure concurrency and parameter consistency**

Consider mirroring the launch.slurm fixes here:
- Background long-running srun services for context and generation; keep proxy in foreground.
- Ensure the generation tp_size in the command matches the comment (4).


Would you like me to propose a full revised snippet aligned with the launch.slurm pattern?

---

`245-250`: **Minor: log filename reuse**

The example for adding another generation server writes to log_gen_0 again. Consider log_gen_1 to avoid clobbering.


```diff
-    --metadata_server_config_file ./metadata_config.yaml &> log_gen_0 &
+    --metadata_server_config_file ./metadata_config.yaml &> log_gen_1 &

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d643aef and ea3715d.

📒 Files selected for processing (55)

benchmarks/cpp/README.md (1 hunks)
docs/source/advanced/disaggregated-service.md (0 hunks)
docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md (1 hunks)
examples/cpp/executor/README.md (1 hunks)
examples/disaggregated/README.md (5 hunks)
examples/disaggregated/disagg_config.yaml (1 hunks)
examples/disaggregated/slurm/benchmark/gen_yaml.py (2 hunks)
examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml (1 hunks)
examples/disaggregated/slurm/simple_example/disagg_config.yaml (1 hunks)
examples/disaggregated/slurm/simple_example/gen_extra-llm-api-config.yaml (1 hunks)
examples/disaggregated/slurm/simple_example/launch.slurm (1 hunks)
examples/wide_ep/README.md (1 hunks)
examples/wide_ep/slurm_scripts/README.md (2 hunks)
examples/wide_ep/slurm_scripts/submit.sh (1 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor.py (1 hunks)
tensorrt_llm/llmapi/llm_args.py (1 hunks)
tensorrt_llm/serve/openai_disagg_server.py (1 hunks)
tests/integration/defs/accuracy/test_disaggregated_serving.py (10 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1 hunks)
tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1 hunks)
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (3 hunks)
tests/unittest/llmapi/test_llm_args.py (1 hunks)

💤 Files with no reviewable changes (1)

docs/source/advanced/disaggregated-service.md

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

examples/disaggregated/slurm/benchmark/gen_yaml.py
tests/integration/defs/disaggregated/test_disaggregated_etcd.py
tensorrt_llm/serve/openai_disagg_server.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/unittest/llmapi/test_llm_args.py
tensorrt_llm/llmapi/llm_args.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

examples/disaggregated/slurm/benchmark/gen_yaml.py
tests/integration/defs/disaggregated/test_disaggregated_etcd.py
tensorrt_llm/serve/openai_disagg_server.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/unittest/llmapi/test_llm_args.py
tensorrt_llm/llmapi/llm_args.py

🧠 Learnings (6)

📚 Learning: 2025-08-06T13:58:07.506Z

Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
examples/disaggregated/slurm/simple_example/launch.slurm
tensorrt_llm/llmapi/llm_args.py

📚 Learning: 2025-08-01T15:14:45.673Z

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
examples/cpp/executor/README.md
tensorrt_llm/llmapi/llm_args.py
examples/disaggregated/README.md

📚 Learning: 2025-07-22T09:22:14.726Z

Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
benchmarks/cpp/README.md
tensorrt_llm/_torch/pyexecutor/py_executor.py
examples/disaggregated/slurm/simple_example/launch.slurm
examples/disaggregated/README.md

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
benchmarks/cpp/README.md
tensorrt_llm/_torch/pyexecutor/py_executor.py
examples/disaggregated/slurm/simple_example/launch.slurm
tensorrt_llm/llmapi/llm_args.py
examples/disaggregated/README.md

📚 Learning: 2025-07-17T09:01:27.402Z

Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/llmapi/llm_args.py

📚 Learning: 2025-08-09T02:04:49.580Z

Learnt from: Fridah-nv
PR: NVIDIA/TensorRT-LLM#6760
File: tensorrt_llm/_torch/auto_deploy/models/quant_config_reader.py:81-98
Timestamp: 2025-08-09T02:04:49.580Z
Learning: In TensorRT-LLM's auto_deploy module, torch.dtype values in configuration dictionaries must be stored as string representations (e.g., "float16" instead of torch.float16) because OmegaConf.merge does not support torch.dtype types. These string representations are converted to actual torch.dtype objects in downstream code.

Applied to files:

tensorrt_llm/llmapi/llm_args.py

🪛 markdownlint-cli2 (0.17.2)

examples/cpp/executor/README.md

129-129: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

examples/disaggregated/README.md

28-28: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

45-45: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

99-99: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

113-113: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

131-131: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

148-148: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

163-163: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

174-174: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

191-191: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.12.2)

tests/unittest/llmapi/test_llm_args.py

669-669: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)

677-677: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (46)

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (2)

18-18: LGTM on backend standardization

Using uppercase backend literals aligns with the updated type constraints and improves consistency across configs and tests.

Also applies to: 33-33

18-18: Standardize backend literals to uppercase across all test configurations

The search uncovered remaining lowercase backend values in several integration test YAML files. Please update these entries to their uppercase equivalents:

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml
• Line 12: backend: "ucx" → backend: "UCX"
• Line 20: backend: "ucx" → backend: "UCX"

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml
• Line 12: backend: "nixl" → backend: "NIXL"
• Line 20: backend: "nixl" → backend: "NIXL"

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml
• Line 12: backend: "mpi" → backend: "MPI"
• Line 20: backend: "mpi" → backend: "MPI"

Example diff:
-    backend: "ucx"
+    backend: "UCX"
After updating these, re-run the lowercase-literal search to confirm no lowercase backends remain.

Likely an incorrect or invalid review comment.
examples/disaggregated/disagg_config.yaml (1)

14-14: LGTM on backend standardization

Already quoted; aligns with the new uppercase literal set and keeps YAML style consistent.

Also applies to: 22-22

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1)

13-13: LGTM on backend standardization

Uppercase value matches the updated type constraints used by CacheTransceiverConfig.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1)

11-11: LGTM on backend standardization

Uppercase matches the new accepted literals; no other changes needed.

Also applies to: 19-19

examples/disaggregated/slurm/benchmark/gen_yaml.py (2)

200-201: Uppercasing backend to "DEFAULT" — consistent with new standard.

Matches the uppercase literal convention introduced across the codebase. No issues.

228-229: Uppercasing backend to "DEFAULT" — consistent with new standard.

Matches the uppercase literal convention introduced across the codebase. No issues.

examples/disaggregated/slurm/simple_example/gen_extra-llm-api-config.yaml (1)

1-3: UCX backend and buffer size look good.

Config aligns with the uppercase backend convention and adds a reasonable buffer size. LGTM.

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (2)

24-24: Backend literal standardized to "DEFAULT" — OK.

38-38: Backend literal standardized to "DEFAULT" — OK.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (2)

17-17: Backend literal standardized to "DEFAULT" — OK.

28-28: Backend literal standardized to "DEFAULT" — OK.

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (2)

18-18: Backend literal standardized to "DEFAULT" — OK.

33-33: Backend literal standardized to "DEFAULT" — OK.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (2)

14-14: Backend literal standardized to DEFAULT — looks correct

Matches the updated uppercase literals and is consistent across context and generation server configs.

Also applies to: 24-24

14-14: No lowercase “default” remains in backend assignments
Verified via a case-insensitive search across all YAML and code files—every backend: entry uses uppercase DEFAULT. No mixed-case or lowercase default usages were found.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (1)

19-19: Consistent capitalization of backend (DEFAULT) — OK

Aligned with updated type validation. No other changes required here.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (1)

14-14: Backend set to DEFAULT for both roles — OK

Matches the new allowed literals and stays consistent across sections.

Also applies to: 23-23

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (1)

17-17: Updated backend to DEFAULT — looks good

Consistent with the standardized uppercase values and surrounding configs.

Also applies to: 26-26

tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1)

13-13: DEFAULT backend capitalization applied — good

Change is minimal and consistent across context and generation servers.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (1)

17-17: Uppercase backend value standardization LGTM.

Aligns with the updated literals in code/tests. No functional concerns.

Also applies to: 26-26

tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (1)

21-21: Uppercase backend value standardization LGTM.

Consistent with updated accepted literals; test config looks correct.

Also applies to: 36-36

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1)

13-13: Standardization to uppercase DEFAULT is correct.

Matches the new allowed values; no issues spotted.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (1)

19-19: LGTM on backend casing change.

Consistent with the updated enum-like literals across configs.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (1)

17-17: Backend normalization to DEFAULT looks good.

No functional issues with the change.

Also applies to: 27-27

tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (1)

19-19: LGTM: standardized backend literal to DEFAULT

Matches the new uppercase convention and remains valid YAML.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (1)

19-19: LGTM: backend literal set to uppercase DEFAULT

Consistent with updated type annotations and other configs.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1)

13-13: LGTM: uppercase DEFAULT applied

Change is correct and consistent with the broader refactor.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (1)

13-13: LGTM: backend literal standardized to DEFAULT

Aligned with the new accepted values; no issues spotted.

Also applies to: 25-25

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (1)

12-12: Backend literal normalized to DEFAULT — looks correct

Matches the uppercase literal convention enforced in code. No functional concerns.

Also applies to: 23-23

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (1)

20-20: Backend literal standardized to "DEFAULT"

Aligned with new accepted uppercase literals. Good update.

Also applies to: 36-36

tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (1)

19-19: Use of "DEFAULT" backend matches enforced literals

Change is correct and consistent with the API expectations.

Also applies to: 35-35

tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)

12-12: "DEFAULT" backend value normalization LGTM

Consistent with the new literal set and validation.

Also applies to: 21-21

benchmarks/cpp/README.md (1)

339-342: Env var rename to UCX backend is correct and consistent.

Switching to TRTLLM_USE_UCX_KVCACHE=1 matches the new transceiver backend naming and the rest of the docs. LGTM.

Also applies to: 347-351

examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml (1)

1-6: Config looks good and aligns with disaggregated context constraints.

UCX backend and disabling overlap scheduler are appropriate here. No issues found.

examples/wide_ep/slurm_scripts/README.md (1)

20-20: Path updates look right.

References now point to examples/disaggregated/benchmark/slurm. LGTM.

Also applies to: 38-40
examples/cpp/executor/README.md (2)

127-127: UCX env var requirement is correct.

Using TRTLLM_USE_UCX_KVCACHE=1 for disaggregated executor is consistent with the new backend naming. Good change.

Also applies to: 130-134

127-127: Remove all remaining TRTLLM_USE_MPI_KVCACHE references

Stale mentions of the deprecated MPI-based KV cache flag were found in the following locations. Please update or remove them (e.g., switch to TRTLLM_USE_UCX_KVCACHE where appropriate):

tests/integration/defs/triton_server/test_triton_llm.py (≈line 3380)

tests/integration/defs/disaggregated/test_disaggregated.py (line 701)

tests/integration/defs/cpp/test_e2e.py (line 48)

tests/integration/defs/cpp/test_multi_gpu.py (line 35)

tests/integration/defs/triton_server/test.sh (lines 1130, 1145)

tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py (line 44)

cpp/tensorrt_llm/common/envUtils.cpp (line 271)

examples/disaggregated/README.md (line 304)

docker/common/install_mpi4py.sh (lines 34, 47, 50)
⛔ Skipped due to learnings
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: sklevtsov-nvidia
PR: NVIDIA/TensorRT-LLM#3294
File: cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_scatter.hpp:36-36
Timestamp: 2025-08-08T05:06:31.537Z
Learning: CUTLASS extension files (under cpp/tensorrt_llm/cutlass_extensions/) follow CUTLASS coding style conventions, including using #pragma once instead of TRTLLM_ prefixed header guards, even though they are .hpp files.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Learnt from: yiqingy0
PR: NVIDIA/TensorRT-LLM#5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1)

247-249: Backend casing updated to 'DEFAULT' — good.

Matches the standardized uppercase convention.

Also applies to: 255-257

examples/disaggregated/slurm/simple_example/disagg_config.yaml (1)

4-4: Incorrect conflation of model backend with cache transceiver backend

The top-level backend: pytorch in examples/disaggregated/slurm/simple_example/disagg_config.yaml drives which inference engine to launch (PyTorch vs. TRT), not the KV‐cache transceiver. Cache transceiver settings live under each server’s cache_transceiver_config section. You can safely leave or rename the top-level backend field according to its own schema (e.g. backend: pytorch).

Likely an incorrect or invalid review comment.

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (3)

277-278: LGTM: consistent use of uppercase "DEFAULT"

Aligned with new accepted literals.

380-381: LGTM: consistent use of uppercase "DEFAULT"

Aligned with new accepted literals.

134-135: LGTM: backend standardized to "DEFAULT"
Verified there are no remaining lowercase "default" or "ucx" references in the repo.
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)

247-248: Consistent backend literal update to "DEFAULT" — LGTM

All these updates align with the new uppercase convention. No functional concerns.

Also applies to: 256-258, 291-293, 331-333, 339-341, 382-385, 396-398, 535-536, 575-576, 624-626, 631-633, 675-677, 682-684

426-429: Lowercase "default" backend literals missed; update to "DEFAULT" for consistency and validation compatibility

A few cache_transceiver_config backends still use "default" (lowercase). Please update to "DEFAULT" to match the new accepted values and prevent config parsing/validation mismatches.

Apply this diff:
@@
         ctx_server_config = {
             "disable_overlap_scheduler": True,
             "guided_decoding_backend": backend,
             "cache_transceiver_config": {
-                "backend": "default"
+                "backend": "DEFAULT"
             }
         }
         gen_server_config = {
             "guided_decoding_backend": backend,
             "cache_transceiver_config": {
-                "backend": "default"
+                "backend": "DEFAULT"
             }
         }
@@
         ctx_server_config = {
             "disable_overlap_scheduler": True,
             "speculative_config": speculative_decoding_config,
             "kv_cache_config": {
                 "free_gpu_memory_fraction": 0.8,
             },
             "guided_decoding_backend": backend,
             "cache_transceiver_config": {
-                "backend": "default"
+                "backend": "DEFAULT"
             }
         }
         gen_server_config = {
             "disable_overlap_scheduler": True,
             "speculative_config": speculative_decoding_config,
             "kv_cache_config": {
                 "free_gpu_memory_fraction": 0.8,
             },
             "guided_decoding_backend": backend,
             "cache_transceiver_config": {
-                "backend": "default"
+                "backend": "DEFAULT"
             }
         }
Also applies to: 431-435, 474-477, 485-488

Likely an incorrect or invalid review comment.
examples/disaggregated/README.md (1)

24-32: Good: example config uses uppercase backends and disables overlap on context

The examples align with the new uppercase backend convention and the current limitation on overlap scheduler for context servers.

Also applies to: 35-40

coderabbitai · 2025-08-09T09:53:25Z

docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md

@@ -277,7 +277,7 @@ We also conducted performance evaluations of Qwen 3 on GB200 GPUs. The data indi

 ### Reproducing Steps

-We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm).
+We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm).


💡 Verification agent

🧩 Analysis chain

Updated link path — please verify the URL resolves

The path now points to examples/disaggregated/benchmark/slurm. Verify it exists and renders as expected.

🏁 Script executed:

#!/bin/bash set -euo pipefail URL="https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm" # Expect: HTTP 200 for HTML content curl -sI "$URL" | sed -n '1p;/^content-type:/Ip'

Length of output: 297

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate any directory named "slurm" in the repo to verify the correct path fd -t d slurm

Length of output: 97

Fix broken link to Slurm scripts in blog post

The link to examples/disaggregated/benchmark/slurm returns a 404. Update it to point to the existing examples/disaggregated/slurm directory.

• File: docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
• Line: 280

Suggested diff:

- [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm) + [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm).

We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm).

🤖 Prompt for AI Agents

In docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md at line 280, the link to the Slurm scripts is broken because it points to a non-existent directory. Update the URL from "examples/disaggregated/benchmark/slurm" to "examples/disaggregated/slurm" to fix the 404 error and ensure the link directs to the correct existing directory.

coderabbitai · 2025-08-09T09:53:25Z

examples/disaggregated/slurm/benchmark/gen_yaml.py

            'cache_transceiver_config': {
                'max_tokens_in_buffer': cache_transceiver_max_num_tokens,
-                'backend': 'default',
+                'backend': 'DEFAULT',
            },
        },


💡 Verification agent

🧩 Analysis chain

Sanity check: ensure no lingering lowercase backend literals remain.

As a follow-up, verify no configs or code still use backend: default/ucx/etc. in lowercase.

Run:

Also applies to: 226-229

🏁 Script executed:

#!/bin/bash # Find lingering lowercase backend literals in YAML/Python/JSON/TOML rg -n --no-heading -e $'backend:\\s*default\\b' -e $'backend:\\s*ucx\\b' -e $'backend:\\s*mpi\\b' -e $'backend:\\s*nixl\\b' -e $'\"backend\"\\s*:\\s*\"default\"' -e $'\\'backend\\'\\s*:\\s*\\'default\\'' -e $'backend:\\s*\"default\"'

Length of output: 616

Update lingering lowercase backend literals in tests

The search uncovered lingering lowercase "backend": "default" entries in the integration tests. Please update these to use the uppercase literal ("DEFAULT") for consistency:

tests/integration/defs/accuracy/test_disaggregated_serving.py:427

tests/integration/defs/accuracy/test_disaggregated_serving.py:433

tests/integration/defs/accuracy/test_disaggregated_serving.py:475

tests/integration/defs/accuracy/test_disaggregated_serving.py:486

🤖 Prompt for AI Agents

In examples/disaggregated/slurm/benchmark/gen_yaml.py around lines 198 to 202, update any occurrences of the lowercase string "default" used as a backend literal to the uppercase "DEFAULT" to maintain consistency. Search through the specified integration test files for similar lowercase literals and replace them with the uppercase version as well.

coderabbitai · 2025-08-09T09:53:25Z

examples/disaggregated/slurm/simple_example/disagg_config.yaml

+context_servers:
+  num_instances: 1
+  urls:
+      - "ctx_hostname:8001"
+generation_servers:
+  num_instances: 1
+  urls:
+      - "gen_hostname:8002"


🛠️ Refactor suggestion

Consider embedding per-server cache_transceiver_config blocks

Other configs in this PR place the transceiver backend under each server type. If this proxy config is intended to route disaggregated requests, consider aligning structure:

context_servers.cache_transceiver_config.backend: DEFAULT

generation_servers.cache_transceiver_config.backend: DEFAULT

Example structure:

hostname: localhost port: 8000 context_servers: num_instances: 1 urls: - "ctx_hostname:8001" cache_transceiver_config: backend: DEFAULT generation_servers: num_instances: 1 urls: - "gen_hostname:8002" cache_transceiver_config: backend: DEFAULT

Happy to propose a concrete patch once the schema is confirmed.

🤖 Prompt for AI Agents

In examples/disaggregated/slurm/simple_example/disagg_config.yaml around lines 5 to 12, the configuration lacks per-server cache_transceiver_config blocks under context_servers and generation_servers. To fix this, embed a cache_transceiver_config section with a backend key set to DEFAULT under both context_servers and generation_servers, aligning the structure with other configs in the PR and supporting proper routing of disaggregated requests.

coderabbitai · 2025-08-09T09:53:25Z

examples/disaggregated/slurm/simple_example/launch.slurm

+srun --container-image=${container_image} \
+     --container-mounts=${mount_paths} \
+     -N 2 --ntasks-per-node=4 \
+     --mpi=pmix \
+     bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml"
+


⚠️ Potential issue

srun will block; background long-running services or they’ll never reach subsequent steps

As written, this srun will block until the context service exits, so the generation and proxy steps won’t run. Run long-lived services concurrently.

Apply this diff:

srun --container-image=${container_image} \ --container-mounts=${mount_paths} \ -N 2 --ntasks-per-node=4 \ --mpi=pmix \ - bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml" + bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml" &

Optional hardening at top of the script:

#!/bin/bash +set -euo pipefail

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

srun --container-image=${container_image} \

--container-mounts=${mount_paths} \

-N 2 --ntasks-per-node=4 \

--mpi=pmix \

bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml"

#!/bin/bash

set -euo pipefail

srun --container-image=${container_image} \

--container-mounts=${mount_paths} \

-N 2 --ntasks-per-node=4 \

--mpi=pmix \

bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml" &

🤖 Prompt for AI Agents

In examples/disaggregated/slurm/simple_example/launch.slurm around lines 16 to 21, the srun command is currently blocking, preventing subsequent steps from running. To fix this, modify the srun command to run the long-lived service in the background by appending an ampersand (&) at the end of the command string. This allows the script to continue executing subsequent commands concurrently. Optionally, add script hardening at the top of the file as suggested.

coderabbitai · 2025-08-09T09:53:25Z

examples/disaggregated/slurm/simple_example/launch.slurm

+# Launch a generation with `tp_size=4` using one 4-GPU node.
+srun --container-image=${container_image} \
+     --container-mounts=${mount_paths} \
+     -N 1 --ntasks-per-node=4 \
+     --mpi=pmix \
+     bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml"
+


🛠️ Refactor suggestion

Comment says tp_size=4 but command uses --tp_size 8 — make them consistent

Either adjust the comment or the command. If you intend tp=4 on a single 4-GPU node, update the flag accordingly.

Apply this diff:

-# Launch a generation with `tp_size=4` using one 4-GPU node. +# Launch a generation with `tp_size=4` using one 4-GPU node. srun --container-image=${container_image} \ --container-mounts=${mount_paths} \ -N 1 --ntasks-per-node=4 \ --mpi=pmix \ - bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml" + bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 4 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml" &

Finally, wait for backgrounded services (keep proxy in foreground to hold the allocation):

- srun --container-image=${container_image} \ + srun --container-image=${container_image} \ --container-mounts=${mount_paths} \ -N 1 --ntasks-per-node=1 \ --mpi=pmix \ bash -c "trtllm-llmapi-launch trtllm-serve disaggregated -c ${work_path}/disagg_config.yaml" +wait

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Launch a generation with `tp_size=4` using one 4-GPU node.

srun --container-image=${container_image} \

--container-mounts=${mount_paths} \

-N 1 --ntasks-per-node=4 \

--mpi=pmix \

bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml"

# Launch a generation with `tp_size=4` using one 4-GPU node.

srun --container-image=${container_image} \

--container-mounts=${mount_paths} \

-N 1 --ntasks-per-node=4 \

--mpi=pmix \

bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 4 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml" &

srun --container-image=${container_image} \

--container-mounts=${mount_paths} \

-N 1 --ntasks-per-node=1 \

--mpi=pmix \

bash -c "trtllm-llmapi-launch trtllm-serve disaggregated -c ${work_path}/disagg_config.yaml"

wait

🤖 Prompt for AI Agents

In examples/disaggregated/slurm/simple_example/launch.slurm around lines 22 to 28, the comment states tp_size=4 but the command uses --tp_size 8, causing inconsistency. To fix this, update the --tp_size flag in the command to 4 to match the comment, ensuring the command launches with tp_size=4 on a single 4-GPU node.

coderabbitai · 2025-08-09T09:53:26Z

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml

@@ -16,7 +16,7 @@ context_servers:
    enable_partial_reuse: False
  disable_overlap_scheduler: True
  cache_transceiver_config:
-    backend: default
+    backend: DEFAULT


💡 Verification agent

🧩 Analysis chain

Standardization to DEFAULT acknowledged; ensure backward compatibility and consistent quoting.

Aligns with the new uppercase convention.

Suggest quoting for consistency with other quoted string scalars in the same file.

Scan for missed lowercase occurrences:

Suggested change:

- backend: DEFAULT + backend: "DEFAULT"

Also applies to: 34-34

🏁 Script executed:

#!/bin/bash rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'

Length of output: 194

I wasn’t able to confirm whether any lowercase backend values remain due to the regex error. Let’s rerun with a corrected pattern:

🏁 Script executed:

#!/bin/bash # Search for any unquoted or lowercase backend values in YAML configs rg -n -S -e 'backend:\s*(["\'])?(default|ucx|nixl|mpi)\1?' --glob '*.yaml' tests/integration/defs/disaggregated/

Length of output: 295

We need a corrected regex (Rust regex doesn’t support backreferences). Let’s rerun:

🏁 Script executed:

#!/bin/bash # Verify no lowercase or unquoted backend values exist in YAML configs rg -n -S -e "backend:\s*['\"]?(default|ucx|nixl|mpi)['\"]?" --glob '*.yaml' tests/integration/defs/disaggregated/

Length of output: 8747

Quote and standardize all backend: values in disaggregated test configs

We’ve found dozens of unquoted uses of DEFAULT (and some other backends) across
tests/integration/defs/disaggregated/test_configs/*.yaml. To ensure consistency:

Use uppercase for all backend identifiers (DEFAULT, UCX, NIXL, MPI).

Always wrap the value in double quotes:
- backend: DEFAULT + backend: "DEFAULT"

Apply the same quoting rule for other backends:
- backend: ucx + backend: "UCX" - backend: nixl + backend: "NIXL" - backend: mpi + backend: "MPI"

Suggested one-liner to catch remaining unquoted entries:

rg -n -S --glob '*.yaml' '^\s*backend:\s*[A-Z]+' tests/integration/defs/disaggregated/test_configs/

And a bulk fix (run from repo root):

find tests/integration/defs/disaggregated/test_configs/ -name '*.yaml' \ -exec sed -i -E 's/^( *)(backend:\s*)([A-Z]+)$/\1\2"\3"/' {} +

🤖 Prompt for AI Agents

In tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml at line 19, the backend value DEFAULT is unquoted. To fix this, update the backend value to be uppercase and wrap it in double quotes, changing backend: DEFAULT to backend: "DEFAULT". Apply this quoting and uppercase standard consistently to all backend values in the disaggregated test configs directory.

coderabbitai · 2025-08-09T09:53:26Z

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml

@@ -16,7 +16,7 @@ context_servers:
    enable_partial_reuse: False
  disable_overlap_scheduler: True
  cache_transceiver_config:
-    backend: default
+    backend: DEFAULT


💡 Verification agent

❓ Verification inconclusive

Consistent with uppercase convention; minor quoting nit and migration check.

Change is correct per the new allowed literals.

Recommend quoting for consistency with other string fields.

Find missed lowercase:

Diff:

- backend: DEFAULT + backend: "DEFAULT"

Also applies to: 34-34

🏁 Script executed:

#!/bin/bash rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'

Length of output: 194

🏁 Script executed:

#!/usr/bin/env bash # Search for any occurrences of lowercase backend values (quoted or unquoted) in YAML configs rg -n -E "backend:\s*(['\"])?(default|ucx|nixl|mpi)\1" --glob "*.yaml"

Length of output: 471

#!/usr/bin/env bash set -euo pipefail echo "Searching for any lowercase backend values in YAML files (unquoted, single-quoted, or double-quoted)..." for val in default ucx nixl mpi; do echo "=== Checking unquoted: backend: $val ===" grep -R --include='*.yaml' -n "backend: $val" . || true echo "=== Checking double-quoted: backend: \"$val\" ===" grep -R --include='*.yaml' -n "backend: \"$val\"" . || true echo "=== Checking single-quoted: backend: '$val' ===" grep -R --include='*.yaml' -n "backend: '$val'" . || true done

Quote uppercase backend literal for consistency

To align with other string fields, please add quotes around the uppercase DEFAULT values in your YAML:

• tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
– Line 19: change backend: DEFAULT → backend: "DEFAULT"
– Line 34: change backend: DEFAULT → backend: "DEFAULT"

Diff:

--- a/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml +++ b/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml @@ -19,1 +19,1 @@ - backend: DEFAULT + backend: "DEFAULT" @@ -34,1 +34,1 @@ - backend: DEFAULT + backend: "DEFAULT"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

backend: DEFAULT

--- a/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml

+++ b/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml

@@ -19,1 +19,1 @@

- backend: DEFAULT

+ backend: "DEFAULT"

@@ -34,1 +34,1 @@

- backend: DEFAULT

+ backend: "DEFAULT"

🤖 Prompt for AI Agents

In tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml at lines 19 and 34, the uppercase string literal DEFAULT under the backend key should be enclosed in quotes for consistency with other string fields. Update both lines to use backend: "DEFAULT" instead of backend: DEFAULT.

coderabbitai · 2025-08-09T09:53:26Z

...fs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml

@@ -11,7 +11,7 @@ context_servers:
  pipeline_parallel_size: 1
  enable_attention_dp: True
  cache_transceiver_config:
-    backend: default
+    backend: DEFAULT


💡 Verification agent

🧩 Analysis chain

Backend value updated; consider quoting and verify no lowercase remains elsewhere.

Uppercase DEFAULT matches the updated literals; looks good.

For stylistic consistency with other quoted strings in the file, consider quoting.

Repo-wide check:

Patch:

- backend: DEFAULT + backend: "DEFAULT"

Also applies to: 23-23

🏁 Script executed:

#!/bin/bash rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'

Length of output: 194

To catch any unquoted or lowercase default occurrences, let’s retry with a simpler pattern. Please run:

🏁 Script executed:

#!/bin/bash # Search for any backend: DEFAULT, default, ucx, nixl, or mpi (quoted or unquoted) rg -n "backend:\s*(\"|')?(DEFAULT|default|ucx|nixl|mpi)(\"|')?"

Length of output: 9011

Quote DEFAULT in YAML backend entries for consistency

Multiple test config files use an unquoted DEFAULT value for the backend key. To match the style of other quoted literals (e.g. "ucx", "nixl", "mpi"), please wrap DEFAULT in quotes.

Files to update (unquoted backend: DEFAULT occurrences):

tests/integration/defs/disaggregated/test_configs/**/*.yaml

examples/disaggregated/disagg_config.yaml (lines 14, 22)

Example patch:

- backend: DEFAULT + backend: "DEFAULT"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

backend: DEFAULT

backend: "DEFAULT"

🤖 Prompt for AI Agents

In tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml at line 14, the backend value DEFAULT is unquoted. To maintain consistency with other YAML files where backend values like "ucx" or "mpi" are quoted, update this line to quote DEFAULT as "DEFAULT". Also, apply this quoting style to all other unquoted DEFAULT backend entries in the specified test config YAML files and examples/disaggregated/disagg_config.yaml at lines 14 and 22.

coderabbitai · 2025-08-09T09:53:27Z

tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml

@@ -16,7 +16,7 @@ context_servers:
    batch_sizes: [1,3000]
  disable_overlap_scheduler: True
  cache_transceiver_config:
-    backend: default
+    backend: DEFAULT


💡 Verification agent

🧩 Analysis chain

Uppercase backend adopted; add quotes for consistency and verify global sweep.

Matches PR objective to standardize backend literals.

Optional: quote the value to align with other quoted scalars (e.g., "pytorch").

Sweep for lowercase literals:

Localized change:

- backend: DEFAULT + backend: "DEFAULT"

Also applies to: 37-37

🏁 Script executed:

#!/bin/bash rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'

Length of output: 194

🏁 Script executed:

#!/bin/bash # Scan all YAML files for lowercase backend values for val in default ucx nixl mpi; do echo "Searching for backend: ${val}" rg -n -S "backend:\s*${val}" --glob '*.yaml' done

Length of output: 7846

Quote uppercase backend scalars for consistency

A global sweep found only uppercase backend: values—no lowercase literals—so please wrap all of them in quotes to align with other YAML scalars.

• Files to update:

All tests/integration/defs/disaggregated/test_configs/*.yaml entries of backend: DEFAULT

examples/disaggregated/slurm/*.yaml entries of backend: UCX

Example diff:

- backend: DEFAULT + backend: "DEFAULT"

- backend: UCX + backend: "UCX"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

backend: DEFAULT

backend: "DEFAULT"

🤖 Prompt for AI Agents

In tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml at line 19, the backend scalar DEFAULT is unquoted. To maintain consistency with other YAML scalars, wrap the value DEFAULT in double quotes, changing backend: DEFAULT to backend: "DEFAULT". Apply this quoting style to all similar uppercase backend values in the specified YAML files.

coderabbitai · 2025-08-09T09:53:27Z

tests/integration/defs/disaggregated/test_disaggregated_etcd.py

+  backend: "DEFAULT"
  max_tokens_in_buffer: 2048"""



💡 Verification agent

🧩 Analysis chain

Sanity-check: ensure no lingering lowercase 'default' backends remain.

Also confirm the args/parser now expects/normalizes uppercase values.

Also applies to: 255-257

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Find any lowercase backend: default" rg -n --hidden --no-ignore 'backend:\s*"?default"?' || echo "No lowercase 'default' found (good)." echo "Find backend enum/choices references (manual inspection target):" rg -n --hidden --no-ignore 'CacheTransceiver|cache_transceiver|backend' tensorrt_llm | sed -n '1,200p'

Length of output: 22273

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the init_backend validator to see if it normalizes/validates uppercase backend values. rg -n "init_backend" -A 10 -B 3 tensorrt_llm/llmapi/llm_args.py

Length of output: 529

Sanity-check passed & parser requires uppercase backends
No lowercase "default" backends remain. The init_backend validator in tensorrt_llm/llmapi/llm_args.py (around line 2121) currently returns the raw value, so it will only accept uppercase inputs (e.g. "DEFAULT").

If you’d like to support lowercase too, update that validator to:

@field_validator('backend', mode='before') def init_backend(cls, v): if v is None: return 'pytorch' return v.upper()

Otherwise, ensure your docs and tests explicitly state that backend names are case-sensitive and must be uppercase.

• tests/integration/defs/disaggregated/test_disaggregated_etcd.py (lines 247-249, 255-257) already use "DEFAULT" correctly.
• tensorrt_llm/llmapi/llm_args.py@2121: adjust init_backend as shown above.

🤖 Prompt for AI Agents

In tests/integration/defs/disaggregated/test_disaggregated_etcd.py around lines 247 to 249, the backend value is correctly uppercase as "DEFAULT". To ensure consistent handling of backend names, update the init_backend validator in tensorrt_llm/llmapi/llm_args.py near line 2121 to convert any input to uppercase by returning v.upper(), and default to 'pytorch' if None. This change will allow the validator to accept lowercase inputs by normalizing them to uppercase, aligning with the test and documentation expectations.

tensorrt-cicd · 2025-08-09T09:54:12Z

PR_Github #14673 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-09T11:25:31Z

PR_Github #14673 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11074 completed with status: 'FAILURE'

Shixiaowei02 added 2 commits August 9, 2025 16:56

fix: mismatch between docs and actual commands

dbbb92b

Signed-off-by: ShiXiaowei02 <[email protected]>

update slurm script

ea3715d

Signed-off-by: Shixiaowei02 <[email protected]>

Shixiaowei02 requested review from a team as code owners August 9, 2025 09:42

Shixiaowei02 requested review from Tabrizian, chuangz0, kaiyux and nv-guomingz August 9, 2025 09:42

Shixiaowei02 requested review from pcastonguay, schetlur-nv and juney-nvidia August 9, 2025 09:47

coderabbitai bot reviewed Aug 9, 2025

View reviewed changes

kaiyux requested a review from qiaoxj07 August 10, 2025 03:30

	We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm).
	We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm).

[TRTLLM-7030][fix] Refactor the example doc of dist-serving #6766

Are you sure you want to change the base?

[TRTLLM-7030][fix] Refactor the example doc of dist-serving #6766

Conversation

Shixiaowei02 commented Aug 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Shixiaowei02 commented Aug 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Aug 9, 2025

Uh oh!

tensorrt-cicd commented Aug 9, 2025

Uh oh!

Uh oh!

Shixiaowei02 commented Aug 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 9, 2025 •

edited

Loading