[#7136][feat] trtllm-serve + autodeploy integration #7141

suyoggupta · 2025-08-21T21:46:37Z

Summary by CodeRabbit

New Features
- Added an AutoDeploy serving backend to the CLI and runtime; --backend now accepts "_autodeploy" alongside "pytorch" and "trt". Autodeploy path is integrated at launch and will disable KV-cache block reuse automatically.
Bug Fixes
- Validator now ensures a concrete default for max sequence length when not specified.
Documentation
- New "Serving with trtllm-serve" guide and navigation link added to AutoDeploy docs.
Tests
- Clarifying comments about AutoDeploy KV-cache behavior added to integration tests.
Chores
- Minor formatting cleanup with no user-facing impact.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Suyog Gupta <[email protected]>

coderabbitai · 2025-08-21T21:46:44Z

📝 Walkthrough

Walkthrough

Adds an AutoDeploy backend branch to the serve CLI runtime path, a pre-validator for max_seq_len in LlmArgs, a serving guide and nav link in docs, small test comment additions, and a formatting tweak in a custom ops file.

Changes

Cohort / File(s)	Summary of Changes
Serve CLI: AutoDeploy backend `tensorrt_llm/commands/serve.py`	Added support for backend `_autodeploy`: import `AutoDeployLLM`, restrict allowed backends, remove `build_config` for autodeploy, disable KV cache block reuse (`kv_cache_config.enable_block_reuse = False`), and instantiate `AutoDeployLLM(**llm_args)`. Updated CLI backend choices/help.
LlmArgs validation `tensorrt_llm/_torch/auto_deploy/llm_args.py`	Added `ensure_max_seq_len` pre-validation (`@field_validator("max_seq_len", mode="before")`) to substitute the AutoDeployConfig default when `max_seq_len` is `None`.
Docs: Serving guide and nav `docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md`, `docs/source/torch/auto_deploy/auto-deploy.md`	New "Serving with trtllm-serve" guide documenting `--backend _autodeploy` usage, config YAML examples, limitations and tips; added navigation link under Advanced Usage.
Custom ops formatting `tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py`	Removed an extra blank line; no behavioral changes.
Tests: comments `tests/integration/defs/accuracy/test_llm_api_autodeploy.py`	Inserted explanatory comments noting AutoDeploy does not support cache reuse yet; no functional changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant CLI as trtllm-serve CLI
  participant Serve as serve.py
  participant AutoDeploy as AutoDeployLLM
  participant PyTorch as PyTorch LLM
  participant Server as HTTP Server

  User->>CLI: trtllm-serve --backend _autodeploy ...
  CLI->>Serve: parse args
  Serve->>Serve: get_llm_args()
  alt backend == "_autodeploy"
    Serve->>Serve: llm_args.pop("build_config", None)
    Serve->>Serve: llm_args["kv_cache_config"].enable_block_reuse = False
    Serve->>AutoDeploy: AutoDeployLLM(**llm_args)
    AutoDeploy-->>Serve: AutoDeploy LLM instance
  else backend == "pytorch"
    Serve->>PyTorch: instantiate PyTorch LLM (existing flow)
    PyTorch-->>Serve: PyTorch LLM instance
  end
  Serve->>Server: start HTTP server with LLM
  User->>Server: /v1/chat/completions
  Server-->>User: response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[None][test] Add accuracy evaluation for AutoDeploy #6764 — Also modifies LlmArgs in tensorrt_llm/_torch/auto_deploy/llm_args.py (adds quant config handling), so changes to validators/attrs may overlap.

Suggested reviewers

chzblych
kaiyux
Fridah-nv

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Signed-off-by: Suyog Gupta <[email protected]>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
277-286: Max-seq-len validator: confirm intent vs. CLI docs; simplify default lookup

The fallback to AutoDeployConfig’s default when max_seq_len is None may contradict the serve CLI help (“If unspecified, deduced from the model config”). If the desired behavior for AutoDeploy is “use model config if unspecified,” consider deferring resolution to the factory/model, or document that AutoDeploy uses a concrete default (currently 512).

Two actionable tweaks:

Use cls.model_fields to fetch the default, which reads better with inheritance.

Optionally guard against accidental string inputs (pydantic will coerce, but explicitness helps).

Minimal diff:
-    def ensure_max_seq_len(cls, value: Any, info: ValidationInfo) -> Any:
+    def ensure_max_seq_len(cls, value: Any, info: ValidationInfo) -> Any:
         if value is None:
-            # Fallback to the AutoDeployConfig default when not provided
-            return AutoDeployConfig.model_fields["max_seq_len"].get_default(
+            # Fallback to the class default when not provided
+            return cls.model_fields["max_seq_len"].get_default(
                 call_default_factory=True
             )
         return value
If the intent is “prefer model-config value when available,” I can propose a small change to resolve max_seq_len during create_factory() by querying the loaded model config when the field is None. Do you want that?
tensorrt_llm/commands/serve.py (3)
113-115: Normalize backend consistently and keep ‘trt’ explicit to match CLI choices

You convert unknown backends to None, which works with the fallback path, but the CLI accepts "trt". Keep “trt” explicit to reduce ambiguity and aid logging/telemetry.

Apply this diff:
-    backend = backend if backend in ["pytorch", "_autodeploy"] else None
+    backend = backend if backend in ["pytorch", "trt", "_autodeploy"] else "trt"
This preserves the TensorRT path when users pass “trt” (or when an unknown value is provided, we default to “trt” rather than None). If you prefer strictness, raise on unknown instead.

Also applies to: 144-145

168-172: Avoid mutating args in place; replace print with logger; make build_config removal safe

Bare print is noisy; use logger for consistency.

del llm_args["build_config"] can KeyError if callers pre-strip it; prefer pop(..., None).

Consider logging only the backend selection (args may contain large objects or sensitive paths).

Apply this diff:
-    elif backend == '_autodeploy':
-        print(f"Using AutoDeploy backend with args: {llm_args}")
-        del llm_args["build_config"]
-        llm = AutoDeployLLM(**llm_args)
+    elif backend == '_autodeploy':
+        logger.info("Using AutoDeploy backend")
+        llm_args.pop("build_config", None)
+        llm = AutoDeployLLM(**llm_args)
212-215: Update help text to include ‘trt’ behavior for clarity

The choices include “trt” but the help text doesn’t document it.

Apply this diff:
-@click.option("--backend",
-              type=click.Choice(["pytorch", "trt", "_autodeploy"]),
-              default="pytorch",
-              help="Set to 'pytorch' for pytorch path and '_autodeploy' for autodeploy path. Default is pytorch path.")
+@click.option("--backend",
+              type=click.Choice(["pytorch", "trt", "_autodeploy"]),
+              default="pytorch",
+              help="Backend to use: 'pytorch' for PyTorch, 'trt' for TensorRT-LLM engine, or '_autodeploy' for AutoDeploy. Default: 'pytorch'.")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f7c597e and e9a1f59.

📒 Files selected for processing (4)

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py (0 hunks)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1 hunks)
tensorrt_llm/_torch/auto_deploy/shim/demollm.py (1 hunks)
tensorrt_llm/commands/serve.py (5 hunks)

💤 Files with no reviewable changes (1)

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tensorrt_llm/_torch/auto_deploy/shim/demollm.py
tensorrt_llm/commands/serve.py
tensorrt_llm/_torch/auto_deploy/llm_args.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tensorrt_llm/_torch/auto_deploy/shim/demollm.py
tensorrt_llm/commands/serve.py
tensorrt_llm/_torch/auto_deploy/llm_args.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

tensorrt_llm/commands/serve.py (1)

15-15: Import of AutoDeployLLM looks fine

Alias is clear and avoids collisions with other LLM classes. No action needed.

tensorrt_llm/_torch/auto_deploy/shim/demollm.py

Signed-off-by: Suyog Gupta <[email protected]>

tensorrt_llm/commands/serve.py

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py

suyoggupta · 2025-08-21T22:53:00Z

cc; @Fridah-nv for doc changes

2ez4bz

LGTM.

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md

Signed-off-by: Suyog Gupta <[email protected]>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (6)
15-17: Tighten wording for bullets under the launch command

Minor style/clarity improvements and consistent terminology.
-- `model`: HF name or local path
-- `--backend _autodeploy`: uses AutoDeploy runtime
+- `model`: Hugging Face model name or a local path
+- `--backend _autodeploy`: uses the AutoDeploy runtime
18-18: Add a note about differing ports

Readers may run the server on a non-default port; call this out next to the curl example.
 Once the server is ready, test with an OpenAI-compatible request:
+If you run the server on a different port (for example via `--port`), adjust the URL accordingly.
21-29: Make curl example fail fast on HTTP errors for easier debugging

Adding -Ssf surfaces server-side errors immediately, improving UX when the endpoint or payload is misconfigured.
-curl -s http://localhost:8000/v1/chat/completions \
+curl -Ssf http://localhost:8000/v1/chat/completions \
44-62: Document optional max_seq_len override to match backend behavior

Per the backend changes, max_seq_len defaults are ensured during validation. Exposing an optional override here will help users understand how to tune it.
 # Compilation backend for AutoDeploy
 compile_backend: torch-opt  # options: torch-simple, torch-compile, torch-cudagraph, torch-opt
+
+# Sequence length (optional)
+# If omitted, AutoDeploy uses its configured default.
+# Uncomment to override:
+# max_seq_len: 4096
66-68: Polish “Limitations” phrasing (docs tone + clarity)

Use formal tone and expand “WIP”.
-- KV cache block reuse is disabled automatically for AutoDeploy backend
-- AutoDeploy backend doesn't yet support disaggregated serving. WIP
+- KV‑cache block reuse is automatically disabled when using the AutoDeploy backend.
+- The AutoDeploy backend does not yet support disaggregated serving (work in progress).
61-61: Optional: Mention dependency note for flashinfer

If attn_backend: flashinfer requires an extra install, a brief note/link would preempt setup issues.

I can add a one-liner with the install command or link if you confirm the exact dependency path/version you want to recommend.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e9a1f59 and b3955bc.

📒 Files selected for processing (4)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (1 hunks)
docs/source/torch/auto_deploy/auto-deploy.md (1 hunks)
tensorrt_llm/commands/serve.py (5 hunks)
tests/integration/defs/accuracy/test_llm_api_autodeploy.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/integration/defs/accuracy/test_llm_api_autodeploy.py

🚧 Files skipped from review as they are similar to previous changes (1)

tensorrt_llm/commands/serve.py

🧰 Additional context used

🪛 LanguageTool

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md

[grammar] ~15-~15: There might be a mistake here.
Context: ... \ ``` - model: HF name or local path - `--backend _autodeploy`: uses AutoDeploy runtime Once the serv...

(QB_NEW_EN)

[grammar] ~66-~66: There might be a mistake here.
Context: ...e block reuse is disabled automatically for AutoDeploy backend - AutoDeploy backend...

(QB_NEW_EN)

[grammar] ~67-~67: There might be a mistake here.
Context: ...t yet support disaggregated serving. WIP - For best performance: - Prefer `compil...

(QB_NEW_EN)

[grammar] ~68-~68: There might be a mistake here.
Context: ...ted serving. WIP - For best performance: - Prefer compile_backend: torch-opt - ...

(QB_NEW_EN)

🪛 GitHub Actions: Release Checks

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md

[error] 1-1: end-of-file-fixer: fixed trailing newline at end of file.

🔇 Additional comments (1)

docs/source/torch/auto_deploy/auto-deploy.md (1)

62-62: Nav link addition looks good; verify path resolution in built docs

The relative link target appears correct. Please confirm the sidebar renders this entry under Advanced Usage after the docs build.

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/commands/serve.py (1)

332-351: Wire moe_cluster_parallel_size through get_llm_args in serve.py

The CLI flag --cluster_size is passed into serve() as moe_cluster_parallel_size=cluster_size, but the get_llm_args function in tensorrt_llm/commands/serve.py doesn’t accept or propagate that parameter, so it’s silently dropped.

• File: tensorrt_llm/commands/serve.py
– At the get_llm_args signature (around line 74), add the new parameter.
– Inside the returned llm_args dict, include the key/value pair.

Proposed patch:

 def get_llm_args(model: str,
                  tokenizer: Optional[str] = None,
                  backend: str = "pytorch",
                  max_beam_width: int = BuildConfig.max_beam_width,
                  max_batch_size: int = BuildConfig.max_batch_size,
                  max_num_tokens: int = BuildConfig.max_num_tokens,
                  max_seq_len: int = BuildConfig.max_seq_len,
                  tensor_parallel_size: int = 1,
                  pipeline_parallel_size: int = 1,
                  moe_expert_parallel_size: Optional[int] = None,
+                 moe_cluster_parallel_size: Optional[int] = None,
                  gpus_per_node: Optional[int] = None,
                  free_gpu_memory_fraction: Optional[float] = None,
                  mamba_ssm_cache_dtype: str = "auto",
                  num_postprocess_workers: int = 0,
                  trust_remote_code: bool = False,
                  reasoning_parser: Optional[str] = None,
                  fail_fast_on_attention_window_too_large: bool = False,
                  **llm_args_extra_dict: Any):
     …
     return {
         "pipeline_parallel_size": pipeline_parallel_size,
         "moe_expert_parallel_size": moe_expert_parallel_size,
+        "moe_cluster_parallel_size": moe_cluster_parallel_size,
         "gpus_per_node": gpus_per_node,
         "free_gpu_memory_fraction": free_gpu_memory_fraction,
         …
     }

♻️ Duplicate comments (1)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (1)
10-13: Fix trailing backslash in the Quick start command (copy/paste breaks).

The last line ends with a continuation backslash but nothing follows, which will cause a shell error on copy/paste.

Apply this diff:
 trtllm-serve \
   meta-llama/Llama-3.1-8B-Instruct \
-  --backend _autodeploy \
+  --backend _autodeploy

🧹 Nitpick comments (5)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (1)
66-73: Tighten wording in Limitations and tips for clarity and tone.

Minor grammar/style nits. Improves readability and aligns with formal doc tone.
-- KV cache block reuse is disabled automatically for AutoDeploy backend
-- AutoDeploy backend doesn't yet support disaggregated serving. WIP
+- KV cache block reuse is disabled automatically for the AutoDeploy backend.
+- The AutoDeploy backend doesn't yet support disaggregated serving (WIP).
 - For best performance:
   - Prefer `compile_backend: torch-opt`
   - Use `attn_backend: flashinfer`
   - Set realistic `cuda_graph_batch_sizes` that match expected traffic
   - Tune `free_mem_ratio` to 0.8–0.9
tensorrt_llm/commands/serve.py (4)
113-113: Backend normalization drops 'trt' explicitly; prefer explicit mapping for clarity.

Currently, any non-['pytorch','_autodeploy'] backend (including 'trt') becomes None and falls into the TRT path implicitly. This works but is confusing given the CLI exposes 'trt'.
-    backend = backend if backend in ["pytorch", "_autodeploy"] else None
+    # Normalize backend: keep explicit 'trt' instead of collapsing to None
+    backend = backend if backend in ["pytorch", "trt", "_autodeploy"] else None
Optionally, make the TRT branch explicit in launch_server (see separate comment).

215-220: Backend CLI help omits 'trt'; align help text with choices.

The Click choices include 'trt' but the help text only mentions 'pytorch' and '_autodeploy'.
-    "Set to 'pytorch' for pytorch path and '_autodeploy' for autodeploy path. Default is pytorch path."
+    "Set to 'pytorch' for the PyTorch path, 'trt' for the TensorRT engine path, or '_autodeploy' for the AutoDeploy path. Default is 'pytorch'."
164-176: Make TRT path explicit in launch_server for readability.

Optional: treat 'trt' explicitly rather than piggybacking on the fallback branch. This matches the CLI surface and future-proofs validation.
-    if backend == 'pytorch':
+    if backend == 'pytorch':
         llm = PyTorchLLM(**llm_args)
-    elif backend == '_autodeploy':
+    elif backend == '_autodeploy':
         ...
-    else:
+    elif backend == 'trt' or backend is None:
         llm = LLM(**llm_args)
+    else:
+        raise ValueError(f"Unsupported backend: {backend}")
37-71: Signal handler comment vs implementation mismatch; clarify and keep it minimal.

Comment says “Using print for safety” but code uses logger calls inside the handler. In Python handlers run on the main thread but keeping them minimal is still advisable. Either switch to print or update the comment.
-    """Signal handler to clean up the child process."""
-    global _child_p_global
-    if _child_p_global and _child_p_global.poll() is None:
-        # Using print for safety in signal handlers
-        logger.info(
+    """Signal handler to clean up the child process (minimal work)."""
+    global _child_p_global
+    if _child_p_global and _child_p_global.poll() is None:
+        # Using logger here is acceptable in Python signal handlers (main thread), but keep work minimal.
+        logger.info(

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b3955bc and b41071f.

📒 Files selected for processing (2)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (1 hunks)
tensorrt_llm/commands/serve.py (5 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tensorrt_llm/commands/serve.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tensorrt_llm/commands/serve.py

🧠 Learnings (1)

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tensorrt_llm/commands/serve.py

🪛 LanguageTool

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md

[grammar] ~15-~15: There might be a mistake here.
Context: ... \ ``` - model: HF name or local path - `--backend _autodeploy`: uses AutoDeploy runtime Once the serv...

(QB_NEW_EN)

[grammar] ~66-~66: There might be a mistake here.
Context: ...e block reuse is disabled automatically for AutoDeploy backend - AutoDeploy backend...

(QB_NEW_EN)

[grammar] ~67-~67: There might be a mistake here.
Context: ...t yet support disaggregated serving. WIP - For best performance: - Prefer `compil...

(QB_NEW_EN)

[grammar] ~68-~68: There might be a mistake here.
Context: ...ted serving. WIP - For best performance: - Prefer compile_backend: torch-opt - ...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (3)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (1)

74-78: Cross-link targets verified

The referenced “See also” links point to existing files in the docs source:

docs/source/torch/auto_deploy/auto-deploy.md

docs/source/torch/auto_deploy/advanced/benchmarking_with_trtllm_bench.md

No further action required.

tensorrt_llm/commands/serve.py (2)

17-17: Importing AutoDeployLLM is correct and scoped.

Alias-based import reads cleanly and avoids name collision with the TRT LLM.

672-695: Signal handlers are registered and restored correctly.

Good defensive pattern: save originals, register temporary handlers, restore in finally.

tensorrt_llm/commands/serve.py

Signed-off-by: Suyog Gupta <[email protected]>

suyoggupta · 2025-08-21T23:11:01Z

/bot run

tensorrt-cicd · 2025-08-21T23:16:05Z

PR_Github #16086 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-22T03:53:15Z

PR_Github #16086 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12097 completed with status: 'FAILURE'

suyoggupta · 2025-08-22T05:25:23Z

/bot run

tensorrt-cicd · 2025-08-22T05:31:03Z

PR_Github #16129 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tensorrt_llm/commands/serve.py (2)
343-344: TypeError: get_llm_args() doesn’t accept moe_cluster_parallel_size but CLI passes it.

This will throw “unexpected keyword argument” and likely explains the failed pipeline. Add the parameter and forward it in the returned dict.

Apply this diff to the function signature and args dict:
 def get_llm_args(model: str,
@@
-                 moe_expert_parallel_size: Optional[int] = None,
+                 moe_expert_parallel_size: Optional[int] = None,
+                 moe_cluster_parallel_size: Optional[int] = None,
@@
     llm_args = {
@@
         "moe_expert_parallel_size":
         moe_expert_parallel_size,
+        "moe_cluster_parallel_size":
+        moe_cluster_parallel_size,
No change needed at the call site after this; the current invocation will become valid.

1-1: Include NVIDIA SPDX Header in serve.py

Please prepend the following line at the very top of tensorrt_llm/commands/serve.py, before any imports:
+ # SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  import asyncio
This matches the style used across the repo for Python source files (e.g. math_utils.py, functional.py) and updates the year range through 2025.

♻️ Duplicate comments (2)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (1)

9-13: Trailing backslash fix looks good — copy/paste works now.

The command now terminates cleanly and won’t error in shells.
tensorrt_llm/commands/serve.py (1)
168-175: Autodeploy: potential AttributeError on kv_cache_config when YAML merges.

If extras set kv_cache_config to a dict or None, enable_block_reuse attribute assignment will fail. Normalize first.

Apply this diff:
-        # TODO(https://github.com/NVIDIA/TensorRT-LLM/issues/7142):
-        # AutoDeploy does not support cache reuse yet.
-        llm_args["kv_cache_config"].enable_block_reuse = False
+        # AutoDeploy does not support cache reuse yet.
+        kv_cfg = llm_args.get("kv_cache_config")
+        if isinstance(kv_cfg, dict):
+            kv_cfg = KvCacheConfig(**kv_cfg)
+        if kv_cfg is None:
+            kv_cfg = KvCacheConfig()
+        kv_cfg.enable_block_reuse = False
+        llm_args["kv_cache_config"] = kv_cfg

🧹 Nitpick comments (6)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (2)
15-16: Spell out “HF” on first use.

Minor clarity nit. Consider “Hugging Face (HF) name or local path.”

Apply this diff:
-- `model`: HF name or local path
+- `model`: Hugging Face (HF) name or local path
66-73: Polish the “Limitations and tips” phrasing for consistency.

Tighten grammar, remove “WIP” from user-facing docs, and end list items with periods.

Apply this diff:
-- KV cache block reuse is disabled automatically for AutoDeploy backend
-- AutoDeploy backend doesn't yet support disaggregated serving. WIP
- - For best performance:
-   - Prefer `compile_backend: torch-opt`
-   - Use `attn_backend: flashinfer`
-   - Set realistic `cuda_graph_batch_sizes` that match expected traffic
-   - Tune `free_mem_ratio` to 0.8–0.9
+- KV cache block reuse is disabled automatically for the AutoDeploy backend.
+- The AutoDeploy backend does not yet support disaggregated serving.
+- For best performance:
+  - Prefer `compile_backend: torch-opt`.
+  - Use `attn_backend: flashinfer`.
+  - Set realistic `cuda_graph_batch_sizes` that match expected traffic.
+  - Tune `free_mem_ratio` to 0.8–0.9.
tensorrt_llm/commands/serve.py (4)
17-17: Import AutoDeploy LLM lazily to avoid hard dependency at module import time.

Keeps the PyTorch/TRT paths usable even if the AutoDeploy stack isn’t available.

Apply this diff:
-from tensorrt_llm._torch.auto_deploy.llm import LLM as AutoDeployLLM
@@
-    elif backend == '_autodeploy':
+    elif backend == '_autodeploy':
+        # Import lazily to avoid hard dependency when not used.
+        from tensorrt_llm._torch.auto_deploy.llm import LLM as AutoDeployLLM
         # AutoDeploy does not support build_config
         llm_args.pop("build_config", None)
-        # TODO(https://github.com/NVIDIA/TensorRT-LLM/issues/7142):
-        # AutoDeploy does not support cache reuse yet.
-        llm_args["kv_cache_config"].enable_block_reuse = False
+        # AutoDeploy does not support cache reuse yet.
+        # Be robust to YAML merges that may set kv_cache_config to dict/None.
+        kv_cfg = llm_args.get("kv_cache_config")
+        if isinstance(kv_cfg, dict):
+            kv_cfg = KvCacheConfig(**kv_cfg)
+        if kv_cfg is None:
+            kv_cfg = KvCacheConfig()
+        kv_cfg.enable_block_reuse = False
+        llm_args["kv_cache_config"] = kv_cfg
         llm = AutoDeployLLM(**llm_args)
Also applies to: 168-175

113-114: Preserve explicit 'trt' backend instead of coercing to None.

Keeping the string improves clarity in logs/config echo while still falling through to the TRT path.

Apply this diff:
-    backend = backend if backend in ["pytorch", "_autodeploy"] else None
+    backend = backend if backend in ["pytorch", "_autodeploy", "trt"] else None
214-220: Help text: mention the TRT path explicitly for completeness.

Make it clear what “trt” does.

Apply this diff:
-    "Set to 'pytorch' for pytorch path and '_autodeploy' for autodeploy path. Default is pytorch path."
+    "Set to 'pytorch' for the PyTorch path, 'trt' for the TensorRT engine path, and '_autodeploy' for the AutoDeploy path. Default is PyTorch."
265-268: Align --kv_cache_free_gpu_memory_fraction Defaults Across CLI and Documentation

The CLI currently defaults --kv_cache_free_gpu_memory_fraction to 0.9 (serve.py:265–266; eval.py:85–87), but several documentation examples use 0.8 as the default fraction:

docs/source/deployment-guide/quick-start-recipe-for-deepseek-r1-on-trtllm.md (lines 117–118) shows --kv_cache_free_gpu_memory_fraction 0.8

docs/source/torch/auto_deploy/advanced/workflow.md (line 22) and serving/benchmarking guides set free_mem_ratio: 0.8

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (lines 55, 72) recommend tuning to 0.8–0.9 but use 0.8 in examples

At the same time, most quick-start recipes (llama3.3, llama4-scout) and tech blogs use 0.9, and scaffolding code defaults to 0.9 (worker.py:151).

To ensure a single source of truth, please choose one of the following:

Option A: Update Documentation to 0.9
– Change all examples that currently use 0.8 to 0.9
– In tuning guides, note that while the default is 0.9, users may adjust within the 0.8–0.9 range as needed

Option B: Change CLI Default to 0.8
– Modify the default in serve.py and eval.py from 0.9 to 0.8
– Update scaffolding/worker.py’s default to 0.8 for consistency
– Clarify in docs that the CLI default is 0.8, with 0.9 still acceptable for higher GPU headroom

Please implement one of these approaches so that code defaults and documentation examples are in sync.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b3955bc and 8b1357d.

📒 Files selected for processing (2)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (1 hunks)
tensorrt_llm/commands/serve.py (5 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tensorrt_llm/commands/serve.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tensorrt_llm/commands/serve.py

🧠 Learnings (1)

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tensorrt_llm/commands/serve.py

🪛 LanguageTool

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md

[grammar] ~15-~15: There might be a mistake here.
Context: ...oy ``` - model: HF name or local path - `--backend _autodeploy`: uses AutoDeploy runtime Once the serv...

(QB_NEW_EN)

[grammar] ~66-~66: There might be a mistake here.
Context: ...e block reuse is disabled automatically for AutoDeploy backend - AutoDeploy backend...

(QB_NEW_EN)

[grammar] ~67-~67: There might be a mistake here.
Context: ...t yet support disaggregated serving. WIP - For best performance: - Prefer `compil...

(QB_NEW_EN)

[grammar] ~68-~68: There might be a mistake here.
Context: ...ted serving. WIP - For best performance: - Prefer compile_backend: torch-opt - ...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md (1)
74-77: Add cross-link to the Compile Backends section in the Support Matrix

Paths and anchors have been verified in docs/source/torch/auto_deploy/support_matrix.md (heading “Compile Backends” → anchor #compile-backends) and the relative path from this file is correct. Please apply the following update:
 ## See also

 - [AutoDeploy overview](../auto-deploy.md)
 - [Benchmarking with trtllm-bench](./benchmarking_with_trtllm_bench.md)
+ - [Compilation backends support matrix](../support_matrix.md#compile-backends)

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md

tensorrt-cicd · 2025-08-22T14:50:42Z

PR_Github #16129 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12132 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py

nv-guomingz

LGTM for doc and trtllm-serve part changes.

trtllm-serve + autodeploy integration

e9a1f59

Signed-off-by: Suyog Gupta <[email protected]>

suyoggupta requested a review from a team as a code owner August 21, 2025 21:46

suyoggupta requested a review from nvchenghaoz August 21, 2025 21:46

github-project-automation bot added this to AutoDeploy Board Aug 21, 2025

github-project-automation bot moved this to Backlog in AutoDeploy Board Aug 21, 2025

suyoggupta marked this pull request as draft August 21, 2025 21:47

revert debug changes

14a884e

Signed-off-by: Suyog Gupta <[email protected]>

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/shim/demollm.py Outdated Show resolved Hide resolved

suyoggupta changed the title ~~[AutoDeploy] trtllm-serve + autodeploy integration~~ [#7136][AutoDeploy] trtllm-serve + autodeploy integration Aug 21, 2025

Disable cache block reuse for AutoDeploy

bb6c1c2

Signed-off-by: Suyog Gupta <[email protected]>

suyoggupta marked this pull request as ready for review August 21, 2025 22:30

Merge branch 'main' into sg/trtllm-serve-autodeploy

4f91d4a

2ez4bz reviewed Aug 21, 2025

View reviewed changes

tensorrt_llm/commands/serve.py Outdated Show resolved Hide resolved

2ez4bz reviewed Aug 21, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py Outdated Show resolved Hide resolved

review comments + docs

b3955bc

suyoggupta requested a review from a team as a code owner August 21, 2025 22:50

suyoggupta requested review from chzblych and kaiyux August 21, 2025 22:50

2ez4bz approved these changes Aug 21, 2025

View reviewed changes

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md Show resolved Hide resolved

lint

b41071f

Signed-off-by: Suyog Gupta <[email protected]>

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md Outdated Show resolved Hide resolved

suyoggupta changed the title ~~[#7136][AutoDeploy] trtllm-serve + autodeploy integration~~ [#7136][feat] trtllm-serve + autodeploy integration Aug 21, 2025

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

tensorrt_llm/commands/serve.py Show resolved Hide resolved

fix typo in docs

556167c

Signed-off-by: Suyog Gupta <[email protected]>

Merge branch 'main' into sg/trtllm-serve-autodeploy

8b1357d

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

docs/source/torch/auto_deploy/advanced/serving_with_trtllm_serve.md Show resolved Hide resolved

nv-guomingz reviewed Aug 22, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py Show resolved Hide resolved

nv-guomingz approved these changes Aug 22, 2025

View reviewed changes

nvchenghaoz approved these changes Aug 22, 2025

View reviewed changes

suyoggupta merged commit e3de575 into NVIDIA:main Aug 22, 2025
5 checks passed

github-project-automation bot moved this from Backlog to Done in AutoDeploy Board Aug 22, 2025

suyoggupta deleted the sg/trtllm-serve-autodeploy branch August 22, 2025 15:44

coderabbitai bot mentioned this pull request Aug 23, 2025

[None][chore] Enable auto deploy accuracy test in CI #7179

Merged

suyoggupta mentioned this pull request Aug 26, 2025

[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder #7233

Merged

[#7136][feat] trtllm-serve + autodeploy integration #7141

[#7136][feat] trtllm-serve + autodeploy integration #7141

Uh oh!

Conversation

suyoggupta commented Aug 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

suyoggupta commented Aug 21, 2025

Uh oh!

2ez4bz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

suyoggupta commented Aug 21, 2025

Uh oh!

tensorrt-cicd commented Aug 21, 2025

Uh oh!

tensorrt-cicd commented Aug 22, 2025

Uh oh!

suyoggupta commented Aug 22, 2025

Uh oh!

tensorrt-cicd commented Aug 22, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Aug 22, 2025

Uh oh!

Uh oh!

nv-guomingz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

suyoggupta commented Aug 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 21, 2025 •

edited

Loading