Skip to content

[TRTLLM-7030][fix] Refactor the example doc of dist-serving #6766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Shixiaowei02
Copy link
Collaborator

@Shixiaowei02 Shixiaowei02 commented Aug 9, 2025

Summary by CodeRabbit

  • New Features

    • Expanded and reorganized documentation for disaggregated serving, including detailed SLURM cluster usage instructions, updated configuration examples, and new sample YAML files and batch scripts.
  • Bug Fixes

    • Improved validation for context-only requests and disaggregated server responses to enhance error handling.
  • Refactor

    • Updated all configuration and code references for cache transceiver backend values from lowercase (e.g., "default", "ucx") to uppercase (e.g., "DEFAULT", "UCX").
  • Documentation

    • Clarified and extended user guides, removed outdated FAQ entries, and updated documentation links and examples for consistency across the project.
  • Tests

    • Adjusted test configurations and assertions to align with updated backend value casing requirements.

Copy link
Contributor

coderabbitai bot commented Aug 9, 2025

📝 Walkthrough

Walkthrough

This update standardizes the backend string values for cache transceiver configuration from lowercase (e.g., "default", "ucx") to uppercase (e.g., "DEFAULT", "UCX") across code, configs, and tests. It also updates related documentation, environment variables, and adds validation checks for disaggregated serving, with expanded and reorganized documentation for SLURM and dynamic scaling.

Changes

Cohort / File(s) Change Summary
Backend String Uppercase Standardization
tensorrt_llm/llmapi/llm_args.py, examples/disaggregated/disagg_config.yaml, examples/disaggregated/slurm/benchmark/gen_yaml.py, tests/integration/defs/accuracy/test_disaggregated_serving.py, tests/integration/defs/disaggregated/test_configs/*, tests/integration/defs/disaggregated/test_disaggregated_etcd.py, tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py, tests/unittest/llmapi/test_llm_args.py
All instances of backend string values in cache transceiver configs and related code/tests changed from lowercase to uppercase (e.g., "default" → "DEFAULT", "ucx" → "UCX"). Type annotations updated accordingly.
Documentation and Usage Examples
benchmarks/cpp/README.md, examples/cpp/executor/README.md, examples/disaggregated/README.md, examples/wide_ep/README.md, examples/wide_ep/slurm_scripts/README.md, examples/wide_ep/slurm_scripts/submit.sh, docs/source/advanced/disaggregated-service.md, docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
Documentation updated for new environment variables, uppercase backend strings, and reorganized/expanded instructions for disaggregated serving, SLURM usage, and dynamic scaling. Deprecated/removed outdated FAQ and MPI-based instructions.
New Disaggregated SLURM Example Assets
examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml, examples/disaggregated/slurm/simple_example/gen_extra-llm-api-config.yaml, examples/disaggregated/slurm/simple_example/disagg_config.yaml, examples/disaggregated/slurm/simple_example/launch.slurm
Added new YAML config files and a SLURM batch script for launching disaggregated serving examples, with correct backend casing and explicit configuration for context/generation servers.
Validation and Error Handling Enhancements
tensorrt_llm/_torch/pyexecutor/py_executor.py, tensorrt_llm/serve/openai_disagg_server.py
Added assertion to disallow context-only requests when disaggregation is disabled, and added validation for non-null ctx_request_id in context server response.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Proxy/Router
    participant ContextServer
    participant GenerationServer

    Client->>Proxy/Router: Send request (context/generation)
    Proxy/Router->>ContextServer: Forward context request
    ContextServer-->>Proxy/Router: Return response (with disaggregated_params)
    Proxy/Router->>GenerationServer: Forward generation request (with ctx_request_id)
    GenerationServer-->>Proxy/Router: Return generated output
    Proxy/Router-->>Client: Return final response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

Documentation

Suggested reviewers

  • pcastonguay
  • kaiyux
  • nv-guomingz

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Shixiaowei02
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🔭 Outside diff range comments (4)
examples/disaggregated/slurm/benchmark/gen_yaml.py (1)

1-1: examples/disaggregated/slurm/benchmark/gen_yaml.py – Add NVIDIA SPDX header

Per our Python file conventions, every non-test .py must start with the NVIDIA SPDX copyright header
including the current year (2025). Please insert the following lines at the very top of
examples/disaggregated/slurm/benchmark/gen_yaml.py, before any imports:

+ # SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ # SPDX-License-Identifier: Apache-2.0
  import argparse

This matches the canonical header used throughout the repo and satisfies compliance.

tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1)

1-2: Update copyright year to include 2025.

Per project guidelines, headers should include the current year for .py files.

-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
tensorrt_llm/llmapi/llm_args.py (1)

1050-1054: Normalize backend to uppercase for backward compatibility

The Literal now enforces uppercase inputs. Existing external configs using "default"/"ucx" (lowercase) will fail validation. Add a pre-validation normalizer to accept lowercase and coerce to uppercase, keeping the codebase strict and user-facing BC:

 @PybindMirror.mirror_pybind_fields(_CacheTransceiverConfig)
 class CacheTransceiverConfig(StrictBaseModel, PybindMirror):
@@
-    backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field(
+    backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field(
         default=None,
         description=
         "The communication backend type to use for the cache transceiver.")
@@
     def _to_pybind(self):
         return _CacheTransceiverConfig(
             backend=_CacheTransceiverBackendType.from_string(self.backend),
             max_tokens_in_buffer=self.max_tokens_in_buffer)
+
+    @field_validator('backend', mode='before')
+    @classmethod
+    def normalize_backend(cls, v):
+        if v is None:
+            return v
+        if isinstance(v, str):
+            v = v.upper()
+        return v

This preserves compatibility with legacy lowercase configs while enforcing uppercase internally.

examples/disaggregated/README.md (1)

232-236: Typo in metadata config: refersh_interval → refresh_interval

The YAML key is misspelled; users copying this will get a non-working config.

Apply this diff:

- refersh_interval: 10.0
+ refresh_interval: 10.0
🧹 Nitpick comments (26)
tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1)

16-16: Keep quoting style consistent across configs

Uppercasing looks good. For consistency with other files that quote this literal, consider quoting DEFAULT.

-    backend: DEFAULT
+    backend: "DEFAULT"
tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1)

17-17: Quote the backend literal for consistency

Functionally equivalent, but quoting matches other config files and avoids mixed styles.

-    backend: DEFAULT
+    backend: "DEFAULT"
tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (1)

22-22: Unify quoting within the same file

Line 22 is unquoted while line 41 is quoted. Pick one style; recommending quoted to match other configs.

-    backend: DEFAULT
+    backend: "DEFAULT"
@@
-    backend: "DEFAULT"
+    backend: "DEFAULT"

Also applies to: 41-41

tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1)

13-13: Quote backend literals for consistency across test configs

Both entries are correct; quoting them keeps style uniform with other files.

-    backend: DEFAULT
+    backend: "DEFAULT"
@@
-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (1)

18-18: Quote backend string for consistency

YAML accepts bare words, but quoting string literals avoids ambiguity and keeps style consistent with other example configs.

Apply this diff:

-    backend: DEFAULT
+    backend: "DEFAULT"

And:

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 33-33

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1)

13-13: Quote backend string for consistency

For consistency with other configs and to avoid YAML ambiguity, quote the string literal.

Apply this diff:

-    backend: DEFAULT
+    backend: "DEFAULT"

And:

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1)

11-11: Quote backend string for consistency

Recommend quoting to match style used in example configs.

Apply this diff:

-    backend: DEFAULT
+    backend: "DEFAULT"

And:

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 19-19

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (1)

17-17: Quote literal for YAML consistency.

To match other string fields (e.g., backend: "pytorch") and avoid accidental keyword interpretation by tooling, quote the value.

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 26-26

tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (1)

21-21: Prefer quoting string literals.

Minor consistency tweak with other quoted strings in the file.

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 36-36

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1)

13-13: Nit: quote the string for clarity.

Keeps style consistent and avoids any YAML/tooling edge-cases.

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (1)

19-19: Nit: quote the backend value.

Stylistic consistency with other quoted strings.

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (1)

17-17: Nit: add quotes around string literal.

For consistency with the rest of the YAML config strings.

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 27-27

tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1)

13-13: Uppercase ‘backend’ values confirmed; ensure parser normalization

All occurrences of backend: default/ucx/mpi/nixl have been updated to uppercase (DEFAULT, UCX, etc.), and a repository-wide scan found no remaining lowercase entries.

Next steps:

  • Verify that the configuration loader (e.g., in your YAML parsing code) normalizes incoming backend values to uppercase at parse-time to maintain backward compatibility for configs that may still use lowercase.
  • No changes to the existing YAML files are required.
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (1)

12-12: Nit: unify YAML quoting style for string scalars

Here DEFAULT is unquoted, while other configs in this PR quote "DEFAULT". Consider standardizing (prefer quoting) for readability and to avoid YAML edge-cases.

Also applies to: 23-23

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (1)

20-20: Nit: consistent quoting across configs

These lines use quotes, while some other configs in this PR don’t. Consider consistently quoting string scalars repo-wide.

Also applies to: 36-36

tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (1)

19-19: Nit: align YAML scalar quoting style

Recommend choosing a single style (quoted vs unquoted) for string values across all disaggregated configs.

Also applies to: 35-35

tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)

12-12: Nit: standardize quoting

Unify quoting of string scalars across the PR for readability and consistency.

Also applies to: 21-21

benchmarks/cpp/README.md (1)

339-342: Add language to fenced code blocks (bash) for readability and consistency.

Use bash on the opening triple backticks enclosing the export/mpirun snippets so tooling renders it properly.

Example:

-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1
 mpirun -n ${proc} benchmarks/disaggServerBenchmark ...
-```
+```

Also applies to: 347-351

examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml (1)

4-6: Optional: quote backend value for style consistency across configs.

Some configs use quoted strings (e.g., "DEFAULT"). Consider:

-  backend: UCX
+  backend: "UCX"
examples/cpp/executor/README.md (1)

129-134: Specify language on fenced block to satisfy markdownlint (MD040).

Mark this code block as bash.

-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1

 mpirun -n <num_ranks> --allow-run-as-root --oversubscribe ./executorExampleDisaggregated ...
-```
+```
tests/unittest/llmapi/test_llm_args.py (2)

669-672: LGTM on "UCX"; add a small fix for Ruff F405

The value change is correct. Ruff warns F405 here due to star import; add an inline noqa to keep the test concise:

-        config = CacheTransceiverConfig(backend="UCX",
+        config = CacheTransceiverConfig(backend="UCX",  # noqa: F405
                                         max_tokens_in_buffer=1024)

Alternatively, explicitly import the class to satisfy both Ruff and the project’s “maintain namespace” guideline:

import tensorrt_llm.llmapi.llm_args as llm_args
config = llm_args.CacheTransceiverConfig(backend="UCX", max_tokens_in_buffer=1024)

677-677: Mirror the F405 fix on the negative test

Add noqa here as well:

-            CacheTransceiverConfig(backend="UCX", invalid_config="should_fail")
+            CacheTransceiverConfig(backend="UCX", invalid_config="should_fail")  # noqa: F405

If you adopt namespaced imports as above, switch to llm_args.CacheTransceiverConfig to avoid noqa.

tensorrt_llm/serve/openai_disagg_server.py (1)

207-209: Good defensive check; consider tightening validation

The None check is useful. If protocol guarantees ctx_request_id is a positive integer, also validate type/range:

  • isinstance(ctx_request_id, int)
  • ctx_request_id > 0

Optionally return a 4xx HTTP error upstream instead of generic ValueError, depending on error handling policy.

I can add a focused unit/integration test that asserts a missing/invalid ctx_request_id raises the expected error. Want a PR follow-up?

examples/disaggregated/README.md (3)

113-115: Add a language to the fenced code block (markdownlint MD040)

Specify the language for the code block.

Apply this diff:

-```
+```bash
 python3 ./clients/disagg_client.py -c disagg_config.yaml -p ./clients/prompts.json -e chat

---

`171-199`: **SLURM srun example: ensure concurrency and parameter consistency**

Consider mirroring the launch.slurm fixes here:
- Background long-running srun services for context and generation; keep proxy in foreground.
- Ensure the generation tp_size in the command matches the comment (4).


Would you like me to propose a full revised snippet aligned with the launch.slurm pattern?

---

`245-250`: **Minor: log filename reuse**

The example for adding another generation server writes to log_gen_0 again. Consider log_gen_1 to avoid clobbering.


```diff
-    --metadata_server_config_file ./metadata_config.yaml &> log_gen_0 &
+    --metadata_server_config_file ./metadata_config.yaml &> log_gen_1 &
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d643aef and ea3715d.

📒 Files selected for processing (55)
  • benchmarks/cpp/README.md (1 hunks)
  • docs/source/advanced/disaggregated-service.md (0 hunks)
  • docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md (1 hunks)
  • examples/cpp/executor/README.md (1 hunks)
  • examples/disaggregated/README.md (5 hunks)
  • examples/disaggregated/disagg_config.yaml (1 hunks)
  • examples/disaggregated/slurm/benchmark/gen_yaml.py (2 hunks)
  • examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml (1 hunks)
  • examples/disaggregated/slurm/simple_example/disagg_config.yaml (1 hunks)
  • examples/disaggregated/slurm/simple_example/gen_extra-llm-api-config.yaml (1 hunks)
  • examples/disaggregated/slurm/simple_example/launch.slurm (1 hunks)
  • examples/wide_ep/README.md (1 hunks)
  • examples/wide_ep/slurm_scripts/README.md (2 hunks)
  • examples/wide_ep/slurm_scripts/submit.sh (1 hunks)
  • tensorrt_llm/_torch/pyexecutor/py_executor.py (1 hunks)
  • tensorrt_llm/llmapi/llm_args.py (1 hunks)
  • tensorrt_llm/serve/openai_disagg_server.py (1 hunks)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (10 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1 hunks)
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (3 hunks)
  • tests/unittest/llmapi/test_llm_args.py (1 hunks)
💤 Files with no reviewable changes (1)
  • docs/source/advanced/disaggregated-service.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/benchmark/gen_yaml.py
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py
  • tensorrt_llm/serve/openai_disagg_server.py
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tests/unittest/llmapi/test_llm_args.py
  • tensorrt_llm/llmapi/llm_args.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/benchmark/gen_yaml.py
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py
  • tensorrt_llm/serve/openai_disagg_server.py
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tests/unittest/llmapi/test_llm_args.py
  • tensorrt_llm/llmapi/llm_args.py
🧠 Learnings (6)
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
  • examples/disaggregated/slurm/simple_example/launch.slurm
  • tensorrt_llm/llmapi/llm_args.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
  • examples/cpp/executor/README.md
  • tensorrt_llm/llmapi/llm_args.py
  • examples/disaggregated/README.md
📚 Learning: 2025-07-22T09:22:14.726Z
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

  • docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
  • benchmarks/cpp/README.md
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • examples/disaggregated/slurm/simple_example/launch.slurm
  • examples/disaggregated/README.md
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
  • benchmarks/cpp/README.md
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • examples/disaggregated/slurm/simple_example/launch.slurm
  • tensorrt_llm/llmapi/llm_args.py
  • examples/disaggregated/README.md
📚 Learning: 2025-07-17T09:01:27.402Z
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tensorrt_llm/llmapi/llm_args.py
📚 Learning: 2025-08-09T02:04:49.580Z
Learnt from: Fridah-nv
PR: NVIDIA/TensorRT-LLM#6760
File: tensorrt_llm/_torch/auto_deploy/models/quant_config_reader.py:81-98
Timestamp: 2025-08-09T02:04:49.580Z
Learning: In TensorRT-LLM's auto_deploy module, torch.dtype values in configuration dictionaries must be stored as string representations (e.g., "float16" instead of torch.float16) because OmegaConf.merge does not support torch.dtype types. These string representations are converted to actual torch.dtype objects in downstream code.

Applied to files:

  • tensorrt_llm/llmapi/llm_args.py
🪛 markdownlint-cli2 (0.17.2)
examples/cpp/executor/README.md

129-129: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

examples/disaggregated/README.md

28-28: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


45-45: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


99-99: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


113-113: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


131-131: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


148-148: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


163-163: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


174-174: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


191-191: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.12.2)
tests/unittest/llmapi/test_llm_args.py

669-669: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)


677-677: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (46)
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (2)

18-18: LGTM on backend standardization

Using uppercase backend literals aligns with the updated type constraints and improves consistency across configs and tests.

Also applies to: 33-33


18-18: Standardize backend literals to uppercase across all test configurations

The search uncovered remaining lowercase backend values in several integration test YAML files. Please update these entries to their uppercase equivalents:

  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml
    • Line 12: backend: "ucx"backend: "UCX"
    • Line 20: backend: "ucx"backend: "UCX"

  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml
    • Line 12: backend: "nixl"backend: "NIXL"
    • Line 20: backend: "nixl"backend: "NIXL"

  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml
    • Line 12: backend: "mpi"backend: "MPI"
    • Line 20: backend: "mpi"backend: "MPI"

Example diff:

-    backend: "ucx"
+    backend: "UCX"

After updating these, re-run the lowercase-literal search to confirm no lowercase backends remain.

Likely an incorrect or invalid review comment.

examples/disaggregated/disagg_config.yaml (1)

14-14: LGTM on backend standardization

Already quoted; aligns with the new uppercase literal set and keeps YAML style consistent.

Also applies to: 22-22

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1)

13-13: LGTM on backend standardization

Uppercase value matches the updated type constraints used by CacheTransceiverConfig.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1)

11-11: LGTM on backend standardization

Uppercase matches the new accepted literals; no other changes needed.

Also applies to: 19-19

examples/disaggregated/slurm/benchmark/gen_yaml.py (2)

200-201: Uppercasing backend to "DEFAULT" — consistent with new standard.

Matches the uppercase literal convention introduced across the codebase. No issues.


228-229: Uppercasing backend to "DEFAULT" — consistent with new standard.

Matches the uppercase literal convention introduced across the codebase. No issues.

examples/disaggregated/slurm/simple_example/gen_extra-llm-api-config.yaml (1)

1-3: UCX backend and buffer size look good.

Config aligns with the uppercase backend convention and adds a reasonable buffer size. LGTM.

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (2)

24-24: Backend literal standardized to "DEFAULT" — OK.


38-38: Backend literal standardized to "DEFAULT" — OK.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (2)

17-17: Backend literal standardized to "DEFAULT" — OK.


28-28: Backend literal standardized to "DEFAULT" — OK.

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (2)

18-18: Backend literal standardized to "DEFAULT" — OK.


33-33: Backend literal standardized to "DEFAULT" — OK.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (2)

14-14: Backend literal standardized to DEFAULT — looks correct

Matches the updated uppercase literals and is consistent across context and generation server configs.

Also applies to: 24-24


14-14: No lowercase “default” remains in backend assignments
Verified via a case-insensitive search across all YAML and code files—every backend: entry uses uppercase DEFAULT. No mixed-case or lowercase default usages were found.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (1)

19-19: Consistent capitalization of backend (DEFAULT) — OK

Aligned with updated type validation. No other changes required here.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (1)

14-14: Backend set to DEFAULT for both roles — OK

Matches the new allowed literals and stays consistent across sections.

Also applies to: 23-23

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (1)

17-17: Updated backend to DEFAULT — looks good

Consistent with the standardized uppercase values and surrounding configs.

Also applies to: 26-26

tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1)

13-13: DEFAULT backend capitalization applied — good

Change is minimal and consistent across context and generation servers.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (1)

17-17: Uppercase backend value standardization LGTM.

Aligns with the updated literals in code/tests. No functional concerns.

Also applies to: 26-26

tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (1)

21-21: Uppercase backend value standardization LGTM.

Consistent with updated accepted literals; test config looks correct.

Also applies to: 36-36

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1)

13-13: Standardization to uppercase DEFAULT is correct.

Matches the new allowed values; no issues spotted.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (1)

19-19: LGTM on backend casing change.

Consistent with the updated enum-like literals across configs.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (1)

17-17: Backend normalization to DEFAULT looks good.

No functional issues with the change.

Also applies to: 27-27

tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (1)

19-19: LGTM: standardized backend literal to DEFAULT

Matches the new uppercase convention and remains valid YAML.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (1)

19-19: LGTM: backend literal set to uppercase DEFAULT

Consistent with updated type annotations and other configs.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1)

13-13: LGTM: uppercase DEFAULT applied

Change is correct and consistent with the broader refactor.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (1)

13-13: LGTM: backend literal standardized to DEFAULT

Aligned with the new accepted values; no issues spotted.

Also applies to: 25-25

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (1)

12-12: Backend literal normalized to DEFAULT — looks correct

Matches the uppercase literal convention enforced in code. No functional concerns.

Also applies to: 23-23

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (1)

20-20: Backend literal standardized to "DEFAULT"

Aligned with new accepted uppercase literals. Good update.

Also applies to: 36-36

tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (1)

19-19: Use of "DEFAULT" backend matches enforced literals

Change is correct and consistent with the API expectations.

Also applies to: 35-35

tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)

12-12: "DEFAULT" backend value normalization LGTM

Consistent with the new literal set and validation.

Also applies to: 21-21

benchmarks/cpp/README.md (1)

339-342: Env var rename to UCX backend is correct and consistent.

Switching to TRTLLM_USE_UCX_KVCACHE=1 matches the new transceiver backend naming and the rest of the docs. LGTM.

Also applies to: 347-351

examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml (1)

1-6: Config looks good and aligns with disaggregated context constraints.

UCX backend and disabling overlap scheduler are appropriate here. No issues found.

examples/wide_ep/slurm_scripts/README.md (1)

20-20: Path updates look right.

References now point to examples/disaggregated/benchmark/slurm. LGTM.

Also applies to: 38-40

examples/cpp/executor/README.md (2)

127-127: UCX env var requirement is correct.

Using TRTLLM_USE_UCX_KVCACHE=1 for disaggregated executor is consistent with the new backend naming. Good change.

Also applies to: 130-134


127-127: Remove all remaining TRTLLM_USE_MPI_KVCACHE references

Stale mentions of the deprecated MPI-based KV cache flag were found in the following locations. Please update or remove them (e.g., switch to TRTLLM_USE_UCX_KVCACHE where appropriate):

  • tests/integration/defs/triton_server/test_triton_llm.py (≈line 3380)
  • tests/integration/defs/disaggregated/test_disaggregated.py (line 701)
  • tests/integration/defs/cpp/test_e2e.py (line 48)
  • tests/integration/defs/cpp/test_multi_gpu.py (line 35)
  • tests/integration/defs/triton_server/test.sh (lines 1130, 1145)
  • tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py (line 44)
  • cpp/tensorrt_llm/common/envUtils.cpp (line 271)
  • examples/disaggregated/README.md (line 304)
  • docker/common/install_mpi4py.sh (lines 34, 47, 50)
⛔ Skipped due to learnings
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: sklevtsov-nvidia
PR: NVIDIA/TensorRT-LLM#3294
File: cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_scatter.hpp:36-36
Timestamp: 2025-08-08T05:06:31.537Z
Learning: CUTLASS extension files (under cpp/tensorrt_llm/cutlass_extensions/) follow CUTLASS coding style conventions, including using #pragma once instead of TRTLLM_ prefixed header guards, even though they are .hpp files.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Learnt from: yiqingy0
PR: NVIDIA/TensorRT-LLM#5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1)

247-249: Backend casing updated to 'DEFAULT' — good.

Matches the standardized uppercase convention.

Also applies to: 255-257

examples/disaggregated/slurm/simple_example/disagg_config.yaml (1)

4-4: Incorrect conflation of model backend with cache transceiver backend

The top-level backend: pytorch in examples/disaggregated/slurm/simple_example/disagg_config.yaml drives which inference engine to launch (PyTorch vs. TRT), not the KV‐cache transceiver. Cache transceiver settings live under each server’s cache_transceiver_config section. You can safely leave or rename the top-level backend field according to its own schema (e.g. backend: pytorch).

Likely an incorrect or invalid review comment.

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (3)

277-278: LGTM: consistent use of uppercase "DEFAULT"

Aligned with new accepted literals.


380-381: LGTM: consistent use of uppercase "DEFAULT"

Aligned with new accepted literals.


134-135: LGTM: backend standardized to "DEFAULT"
Verified there are no remaining lowercase "default" or "ucx" references in the repo.

tests/integration/defs/accuracy/test_disaggregated_serving.py (2)

247-248: Consistent backend literal update to "DEFAULT" — LGTM

All these updates align with the new uppercase convention. No functional concerns.

Also applies to: 256-258, 291-293, 331-333, 339-341, 382-385, 396-398, 535-536, 575-576, 624-626, 631-633, 675-677, 682-684


426-429: Lowercase "default" backend literals missed; update to "DEFAULT" for consistency and validation compatibility

A few cache_transceiver_config backends still use "default" (lowercase). Please update to "DEFAULT" to match the new accepted values and prevent config parsing/validation mismatches.

Apply this diff:

@@
         ctx_server_config = {
             "disable_overlap_scheduler": True,
             "guided_decoding_backend": backend,
             "cache_transceiver_config": {
-                "backend": "default"
+                "backend": "DEFAULT"
             }
         }
         gen_server_config = {
             "guided_decoding_backend": backend,
             "cache_transceiver_config": {
-                "backend": "default"
+                "backend": "DEFAULT"
             }
         }
@@
         ctx_server_config = {
             "disable_overlap_scheduler": True,
             "speculative_config": speculative_decoding_config,
             "kv_cache_config": {
                 "free_gpu_memory_fraction": 0.8,
             },
             "guided_decoding_backend": backend,
             "cache_transceiver_config": {
-                "backend": "default"
+                "backend": "DEFAULT"
             }
         }
         gen_server_config = {
             "disable_overlap_scheduler": True,
             "speculative_config": speculative_decoding_config,
             "kv_cache_config": {
                 "free_gpu_memory_fraction": 0.8,
             },
             "guided_decoding_backend": backend,
             "cache_transceiver_config": {
-                "backend": "default"
+                "backend": "DEFAULT"
             }
         }

Also applies to: 431-435, 474-477, 485-488

Likely an incorrect or invalid review comment.

examples/disaggregated/README.md (1)

24-32: Good: example config uses uppercase backends and disables overlap on context

The examples align with the new uppercase backend convention and the current limitation on overlap scheduler for context servers.

Also applies to: 35-40

@@ -277,7 +277,7 @@ We also conducted performance evaluations of Qwen 3 on GB200 GPUs. The data indi

### Reproducing Steps

We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm).
We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Updated link path — please verify the URL resolves

The path now points to examples/disaggregated/benchmark/slurm. Verify it exists and renders as expected.


🏁 Script executed:

#!/bin/bash
set -euo pipefail
URL="https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm"
# Expect: HTTP 200 for HTML content
curl -sI "$URL" | sed -n '1p;/^content-type:/Ip'

Length of output: 297


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Locate any directory named "slurm" in the repo to verify the correct path
fd -t d slurm

Length of output: 97


Fix broken link to Slurm scripts in blog post

The link to examples/disaggregated/benchmark/slurm returns a 404. Update it to point to the existing examples/disaggregated/slurm directory.

• File: docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
• Line: 280

Suggested diff:

- [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm)
+ [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm).
We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm).
🤖 Prompt for AI Agents
In docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md at
line 280, the link to the Slurm scripts is broken because it points to a
non-existent directory. Update the URL from
"examples/disaggregated/benchmark/slurm" to "examples/disaggregated/slurm" to
fix the 404 error and ensure the link directs to the correct existing directory.

Comment on lines 198 to 202
'cache_transceiver_config': {
'max_tokens_in_buffer': cache_transceiver_max_num_tokens,
'backend': 'default',
'backend': 'DEFAULT',
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Sanity check: ensure no lingering lowercase backend literals remain.

As a follow-up, verify no configs or code still use backend: default/ucx/etc. in lowercase.

Run:

Also applies to: 226-229


🏁 Script executed:

#!/bin/bash
# Find lingering lowercase backend literals in YAML/Python/JSON/TOML
rg -n --no-heading -e $'backend:\\s*default\\b' -e $'backend:\\s*ucx\\b' -e $'backend:\\s*mpi\\b' -e $'backend:\\s*nixl\\b' -e $'\"backend\"\\s*:\\s*\"default\"' -e $'\\'backend\\'\\s*:\\s*\\'default\\'' -e $'backend:\\s*\"default\"'

Length of output: 616


Update lingering lowercase backend literals in tests

The search uncovered lingering lowercase "backend": "default" entries in the integration tests. Please update these to use the uppercase literal ("DEFAULT") for consistency:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py:427
  • tests/integration/defs/accuracy/test_disaggregated_serving.py:433
  • tests/integration/defs/accuracy/test_disaggregated_serving.py:475
  • tests/integration/defs/accuracy/test_disaggregated_serving.py:486
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/benchmark/gen_yaml.py around lines 198 to 202,
update any occurrences of the lowercase string "default" used as a backend
literal to the uppercase "DEFAULT" to maintain consistency. Search through the
specified integration test files for similar lowercase literals and replace them
with the uppercase version as well.

Comment on lines +5 to +12
context_servers:
num_instances: 1
urls:
- "ctx_hostname:8001"
generation_servers:
num_instances: 1
urls:
- "gen_hostname:8002"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider embedding per-server cache_transceiver_config blocks

Other configs in this PR place the transceiver backend under each server type. If this proxy config is intended to route disaggregated requests, consider aligning structure:

  • context_servers.cache_transceiver_config.backend: DEFAULT
  • generation_servers.cache_transceiver_config.backend: DEFAULT

Example structure:

hostname: localhost
port: 8000
context_servers:
  num_instances: 1
  urls:
    - "ctx_hostname:8001"
  cache_transceiver_config:
    backend: DEFAULT
generation_servers:
  num_instances: 1
  urls:
    - "gen_hostname:8002"
  cache_transceiver_config:
    backend: DEFAULT

Happy to propose a concrete patch once the schema is confirmed.

🤖 Prompt for AI Agents
In examples/disaggregated/slurm/simple_example/disagg_config.yaml around lines 5
to 12, the configuration lacks per-server cache_transceiver_config blocks under
context_servers and generation_servers. To fix this, embed a
cache_transceiver_config section with a backend key set to DEFAULT under both
context_servers and generation_servers, aligning the structure with other
configs in the PR and supporting proper routing of disaggregated requests.

Comment on lines +16 to +21
srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 2 --ntasks-per-node=4 \
--mpi=pmix \
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

srun will block; background long-running services or they’ll never reach subsequent steps

As written, this srun will block until the context service exits, so the generation and proxy steps won’t run. Run long-lived services concurrently.

Apply this diff:

 srun --container-image=${container_image} \
      --container-mounts=${mount_paths} \
      -N 2 --ntasks-per-node=4 \
      --mpi=pmix \
-     bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml"
+     bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml" &

Optional hardening at top of the script:

 #!/bin/bash
+set -euo pipefail
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 2 --ntasks-per-node=4 \
--mpi=pmix \
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml"
#!/bin/bash
set -euo pipefail
srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 2 --ntasks-per-node=4 \
--mpi=pmix \
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml" &
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/simple_example/launch.slurm around lines 16 to
21, the srun command is currently blocking, preventing subsequent steps from
running. To fix this, modify the srun command to run the long-lived service in
the background by appending an ampersand (&) at the end of the command string.
This allows the script to continue executing subsequent commands concurrently.
Optionally, add script hardening at the top of the file as suggested.

Comment on lines +22 to +28
# Launch a generation with `tp_size=4` using one 4-GPU node.
srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 1 --ntasks-per-node=4 \
--mpi=pmix \
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Comment says tp_size=4 but command uses --tp_size 8 — make them consistent

Either adjust the comment or the command. If you intend tp=4 on a single 4-GPU node, update the flag accordingly.

Apply this diff:

-# Launch a generation with `tp_size=4` using one 4-GPU node.
+# Launch a generation with `tp_size=4` using one 4-GPU node.
 srun --container-image=${container_image} \
      --container-mounts=${mount_paths} \
      -N 1 --ntasks-per-node=4 \
      --mpi=pmix \
-     bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml"
+     bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 4 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml" &

Finally, wait for backgrounded services (keep proxy in foreground to hold the allocation):

- srun --container-image=${container_image} \
+ srun --container-image=${container_image} \
      --container-mounts=${mount_paths} \
      -N 1 --ntasks-per-node=1 \
      --mpi=pmix \
      bash -c "trtllm-llmapi-launch trtllm-serve disaggregated -c ${work_path}/disagg_config.yaml"
+wait
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Launch a generation with `tp_size=4` using one 4-GPU node.
srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 1 --ntasks-per-node=4 \
--mpi=pmix \
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml"
# Launch a generation with `tp_size=4` using one 4-GPU node.
srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 1 --ntasks-per-node=4 \
--mpi=pmix \
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 4 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml" &
srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 1 --ntasks-per-node=1 \
--mpi=pmix \
bash -c "trtllm-llmapi-launch trtllm-serve disaggregated -c ${work_path}/disagg_config.yaml"
wait
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/simple_example/launch.slurm around lines 22 to
28, the comment states tp_size=4 but the command uses --tp_size 8, causing
inconsistency. To fix this, update the --tp_size flag in the command to 4 to
match the comment, ensuring the command launches with tp_size=4 on a single
4-GPU node.

@@ -16,7 +16,7 @@ context_servers:
enable_partial_reuse: False
disable_overlap_scheduler: True
cache_transceiver_config:
backend: default
backend: DEFAULT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Standardization to DEFAULT acknowledged; ensure backward compatibility and consistent quoting.

  • Aligns with the new uppercase convention.
  • Suggest quoting for consistency with other quoted string scalars in the same file.

Scan for missed lowercase occurrences:

Suggested change:

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 34-34


🏁 Script executed:

#!/bin/bash
rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'

Length of output: 194


I wasn’t able to confirm whether any lowercase backend values remain due to the regex error. Let’s rerun with a corrected pattern:


🏁 Script executed:

#!/bin/bash
# Search for any unquoted or lowercase backend values in YAML configs
rg -n -S -e 'backend:\s*(["\'])?(default|ucx|nixl|mpi)\1?' --glob '*.yaml' tests/integration/defs/disaggregated/

Length of output: 295


We need a corrected regex (Rust regex doesn’t support backreferences). Let’s rerun:


🏁 Script executed:

#!/bin/bash
# Verify no lowercase or unquoted backend values exist in YAML configs
rg -n -S -e "backend:\s*['\"]?(default|ucx|nixl|mpi)['\"]?" --glob '*.yaml' tests/integration/defs/disaggregated/

Length of output: 8747


Quote and standardize all backend: values in disaggregated test configs

We’ve found dozens of unquoted uses of DEFAULT (and some other backends) across
tests/integration/defs/disaggregated/test_configs/*.yaml. To ensure consistency:

  • Use uppercase for all backend identifiers (DEFAULT, UCX, NIXL, MPI).
  • Always wrap the value in double quotes:
    -    backend: DEFAULT
    +    backend: "DEFAULT"
  • Apply the same quoting rule for other backends:
    -    backend: ucx
    +    backend: "UCX"
    -    backend: nixl
    +    backend: "NIXL"
    -    backend: mpi
    +    backend: "MPI"

Suggested one-liner to catch remaining unquoted entries:

rg -n -S --glob '*.yaml' '^\s*backend:\s*[A-Z]+' tests/integration/defs/disaggregated/test_configs/

And a bulk fix (run from repo root):

find tests/integration/defs/disaggregated/test_configs/ -name '*.yaml' \
  -exec sed -i -E 's/^( *)(backend:\s*)([A-Z]+)$/\1\2"\3"/' {} +
🤖 Prompt for AI Agents
In
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml
at line 19, the backend value DEFAULT is unquoted. To fix this, update the
backend value to be uppercase and wrap it in double quotes, changing backend:
DEFAULT to backend: "DEFAULT". Apply this quoting and uppercase standard
consistently to all backend values in the disaggregated test configs directory.

@@ -16,7 +16,7 @@ context_servers:
enable_partial_reuse: False
disable_overlap_scheduler: True
cache_transceiver_config:
backend: default
backend: DEFAULT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Consistent with uppercase convention; minor quoting nit and migration check.

  • Change is correct per the new allowed literals.
  • Recommend quoting for consistency with other string fields.

Find missed lowercase:

Diff:

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 34-34


🏁 Script executed:

#!/bin/bash
rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'

Length of output: 194


🏁 Script executed:

#!/usr/bin/env bash
# Search for any occurrences of lowercase backend values (quoted or unquoted) in YAML configs
rg -n -E "backend:\s*(['\"])?(default|ucx|nixl|mpi)\1" --glob "*.yaml"

Length of output: 471


#!/usr/bin/env bash
set -euo pipefail

echo "Searching for any lowercase backend values in YAML files (unquoted, single-quoted, or double-quoted)..."
for val in default ucx nixl mpi; do
  echo "=== Checking unquoted: backend: $val ==="
  grep -R --include='*.yaml' -n "backend: $val" . || true

  echo "=== Checking double-quoted: backend: \"$val\" ==="
  grep -R --include='*.yaml' -n "backend: \"$val\"" . || true

  echo "=== Checking single-quoted: backend: '$val' ==="
  grep -R --include='*.yaml' -n "backend: '$val'" . || true
done

Quote uppercase backend literal for consistency

To align with other string fields, please add quotes around the uppercase DEFAULT values in your YAML:

• tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
– Line 19: change backend: DEFAULTbackend: "DEFAULT"
– Line 34: change backend: DEFAULTbackend: "DEFAULT"

Diff:

--- a/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
+++ b/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
@@ -19,1 +19,1 @@
-    backend: DEFAULT
+    backend: "DEFAULT"
@@ -34,1 +34,1 @@
-    backend: DEFAULT
+    backend: "DEFAULT"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
backend: DEFAULT
--- a/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
+++ b/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
@@ -19,1 +19,1 @@
- backend: DEFAULT
+ backend: "DEFAULT"
@@ -34,1 +34,1 @@
- backend: DEFAULT
+ backend: "DEFAULT"
🤖 Prompt for AI Agents
In
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
at lines 19 and 34, the uppercase string literal DEFAULT under the backend key
should be enclosed in quotes for consistency with other string fields. Update
both lines to use backend: "DEFAULT" instead of backend: DEFAULT.

@@ -11,7 +11,7 @@ context_servers:
pipeline_parallel_size: 1
enable_attention_dp: True
cache_transceiver_config:
backend: default
backend: DEFAULT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Backend value updated; consider quoting and verify no lowercase remains elsewhere.

  • Uppercase DEFAULT matches the updated literals; looks good.
  • For stylistic consistency with other quoted strings in the file, consider quoting.

Repo-wide check:

Patch:

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 23-23


🏁 Script executed:

#!/bin/bash
rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'

Length of output: 194


To catch any unquoted or lowercase default occurrences, let’s retry with a simpler pattern. Please run:


🏁 Script executed:

#!/bin/bash
# Search for any backend: DEFAULT, default, ucx, nixl, or mpi (quoted or unquoted)
rg -n "backend:\s*(\"|')?(DEFAULT|default|ucx|nixl|mpi)(\"|')?"

Length of output: 9011


Quote DEFAULT in YAML backend entries for consistency

Multiple test config files use an unquoted DEFAULT value for the backend key. To match the style of other quoted literals (e.g. "ucx", "nixl", "mpi"), please wrap DEFAULT in quotes.

Files to update (unquoted backend: DEFAULT occurrences):

  • tests/integration/defs/disaggregated/test_configs/**/*.yaml
  • examples/disaggregated/disagg_config.yaml (lines 14, 22)

Example patch:

-    backend: DEFAULT
+    backend: "DEFAULT"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
backend: DEFAULT
backend: "DEFAULT"
🤖 Prompt for AI Agents
In
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml
at line 14, the backend value DEFAULT is unquoted. To maintain consistency with
other YAML files where backend values like "ucx" or "mpi" are quoted, update
this line to quote DEFAULT as "DEFAULT". Also, apply this quoting style to all
other unquoted DEFAULT backend entries in the specified test config YAML files
and examples/disaggregated/disagg_config.yaml at lines 14 and 22.

@@ -16,7 +16,7 @@ context_servers:
batch_sizes: [1,3000]
disable_overlap_scheduler: True
cache_transceiver_config:
backend: default
backend: DEFAULT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Uppercase backend adopted; add quotes for consistency and verify global sweep.

  • Matches PR objective to standardize backend literals.
  • Optional: quote the value to align with other quoted scalars (e.g., "pytorch").

Sweep for lowercase literals:

Localized change:

-    backend: DEFAULT
+    backend: "DEFAULT"

Also applies to: 37-37


🏁 Script executed:

#!/bin/bash
rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'

Length of output: 194


🏁 Script executed:

#!/bin/bash
# Scan all YAML files for lowercase backend values
for val in default ucx nixl mpi; do
  echo "Searching for backend: ${val}"
  rg -n -S "backend:\s*${val}" --glob '*.yaml'
done

Length of output: 7846


Quote uppercase backend scalars for consistency

A global sweep found only uppercase backend: values—no lowercase literals—so please wrap all of them in quotes to align with other YAML scalars.

• Files to update:

  • All tests/integration/defs/disaggregated/test_configs/*.yaml entries of backend: DEFAULT
  • examples/disaggregated/slurm/*.yaml entries of backend: UCX

Example diff:

-    backend: DEFAULT
+    backend: "DEFAULT"
-  backend: UCX
+  backend: "UCX"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
backend: DEFAULT
backend: "DEFAULT"
🤖 Prompt for AI Agents
In
tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml
at line 19, the backend scalar DEFAULT is unquoted. To maintain consistency with
other YAML scalars, wrap the value DEFAULT in double quotes, changing backend:
DEFAULT to backend: "DEFAULT". Apply this quoting style to all similar uppercase
backend values in the specified YAML files.

Comment on lines +247 to 249
backend: "DEFAULT"
max_tokens_in_buffer: 2048"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Sanity-check: ensure no lingering lowercase 'default' backends remain.

Also confirm the args/parser now expects/normalizes uppercase values.

Also applies to: 255-257


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Find any lowercase backend: default"
rg -n --hidden --no-ignore 'backend:\s*"?default"?' || echo "No lowercase 'default' found (good)."

echo "Find backend enum/choices references (manual inspection target):"
rg -n --hidden --no-ignore 'CacheTransceiver|cache_transceiver|backend' tensorrt_llm | sed -n '1,200p'

Length of output: 22273


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the init_backend validator to see if it normalizes/validates uppercase backend values.
rg -n "init_backend" -A 10 -B 3 tensorrt_llm/llmapi/llm_args.py

Length of output: 529


Sanity-check passed & parser requires uppercase backends
No lowercase "default" backends remain. The init_backend validator in tensorrt_llm/llmapi/llm_args.py (around line 2121) currently returns the raw value, so it will only accept uppercase inputs (e.g. "DEFAULT").

If you’d like to support lowercase too, update that validator to:

@field_validator('backend', mode='before')
def init_backend(cls, v):
    if v is None:
        return 'pytorch'
    return v.upper()

Otherwise, ensure your docs and tests explicitly state that backend names are case-sensitive and must be uppercase.

• tests/integration/defs/disaggregated/test_disaggregated_etcd.py (lines 247-249, 255-257) already use "DEFAULT" correctly.
• tensorrt_llm/llmapi/llm_args.py@2121: adjust init_backend as shown above.

🤖 Prompt for AI Agents
In tests/integration/defs/disaggregated/test_disaggregated_etcd.py around lines
247 to 249, the backend value is correctly uppercase as "DEFAULT". To ensure
consistent handling of backend names, update the init_backend validator in
tensorrt_llm/llmapi/llm_args.py near line 2121 to convert any input to uppercase
by returning v.upper(), and default to 'pytorch' if None. This change will allow
the validator to accept lowercase inputs by normalizing them to uppercase,
aligning with the test and documentation expectations.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14673 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14673 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11074 completed with status: 'FAILURE'

@kaiyux kaiyux requested a review from qiaoxj07 August 10, 2025 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants