NVIDIA
diff --git a/‎docs/source/commands/trtllm-eval.rst
Lines changed: 86 additions & 0 deletions b/‎docs/source/commands/trtllm-eval.rst
Lines changed: 86 additions & 0 deletions
diff --git a/‎docs/source/conf.py
Lines changed: 12 additions & 0 deletions b/‎docs/source/conf.py
Lines changed: 12 additions & 0 deletions
diff --git a/‎docs/source/deployment-guide/index.rst
Lines changed: 11 additions & 0 deletions b/‎docs/source/deployment-guide/index.rst
Lines changed: 11 additions & 0 deletions
diff --git a/‎examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md renamed to ‎docs/source/deployment-guide/quick-start-recipe-for-deepseek-r1-on-trtllm.md
Lines changed: 1 addition & 1 deletion b/‎examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md renamed to ‎docs/source/deployment-guide/quick-start-recipe-for-deepseek-r1-on-trtllm.md
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,86 @@
+trtllm-eval
+===========
+
+About
+-----
+
+The ``trtllm-eval`` command provides developers with a unified entry point for accuracy evaluation. It shares the core evaluation logic with the `accuracy test suite <https://github.com/NVIDIA/TensorRT-LLM/tree/main/tests/integration/defs/accuracy>`_ of TensorRT-LLM.
+
+``trtllm-eval`` is built on the offline API -- LLM API. Compared to the online ``trtllm-serve``, the offline API provides clearer error messages and simplifies the debugging workflow.
+
+The following tasks are currently supported:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 20 25 15 15 15
+
+   * - Dataset
+     - Task
+     - Metric
+     - Default ISL
+     - Default OSL
+   * - CNN Dailymail
+     - summarization
+     - rouge
+     - 924
+     - 100
+   * - MMLU
+     - QA; multiple choice
+     - accuracy
+     - 4,094
+     - 2
+   * - GSM8K
+     - QA; regex matching
+     - accuracy
+     - 4,096
+     - 256
+   * - GPQA
+     - QA; multiple choice
+     - accuracy
+     - 32,768
+     - 4,096
+   * - JSON mode eval
+     - structured generation
+     - accuracy
+     - 1,024
+     - 512
+
+
+Usage and Examples
+------------------
+
+Some evaluation tasks (e.g., GSM8K and GPQA) depend on the ``lm_eval`` package. To run these tasks, you need to install ``lm_eval`` with:
+
+.. code-block:: bash
+
+   pip install -r requirements-dev.txt
+
+Alternatively, you can install the ``lm_eval`` version specified in ``requirements-dev.txt``.
+
+Here are some examples:
+
+.. code-block:: bash
+
+   # Evaluate Llama-3.1-8B-Instruct on MMLU
+   trtllm-eval --model meta-llama/Llama-3.1-8B-Instruct mmlu
+
+   # Evaluate Llama-3.1-8B-Instruct on GSM8K
+   trtllm-eval --model meta-llama/Llama-3.1-8B-Instruct gsm8k
+
+   # Evaluate Llama-3.3-70B-Instruct on GPQA Diamond
+   trtllm-eval --model meta-llama/Llama-3.3-70B-Instruct gpqa_diamond
+
+The ``--model`` argument accepts either a Hugging Face model ID or a local checkpoint path. By default, ``trtllm-eval`` runs the model with the PyTorch backend; you can pass ``--backend tensorrt`` to switch to the TensorRT backend.
+
+Alternatively, the ``--model`` argument also accepts a local path to pre-built TensorRT engines. In this case, you should pass the Hugging Face tokenizer path to the ``--tokenizer`` argument.
+
+For more details, see ``trtllm-eval --help`` and ``trtllm-eval <task> --help``.
+
+
+
+Syntax
+------
+
+.. click:: tensorrt_llm.commands.eval:main
+   :prog: trtllm-eval
+   :nested: full
@@ -105,6 +105,18 @@
 container published for a previous
 [GitHub pre-release or release](https://github.com/NVIDIA/TensorRT-LLM/releases)
 (see also [NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags)).
+```
+    """,
+    "trtllm_serve_tag_admonition":
+    r"""
+```{admonition} trtllm-serve requests
+:class: dropdown note
+If you are running trtllm-server inside a Docker container, you have two options for sending API requests:
+1. Expose port 8000 to access the server from outside the container.
+2. Open a new terminal and use the following command to directly attach to the running container:
+```bash
+docker exec -it <container_id> bash
+
 ```
     """,
 }
 
@@ -0,0 +1,11 @@
+Model Recipes
+================
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Model Recipes
+   :name: Model Recipes
+
+   quick-start-recipe-for-deepseek-r1-on-trtllm.md
+   quick-start-recipe-for-llama3.3-70b-on-trtllm.md
+   quick-start-recipe-for-llama4-scout-on-trtllm.md
@@ -35,7 +35,7 @@ docker run --rm -it \
 -p 8000:8000 \
 -v ~/.cache:/root/.cache:rw \
 --name tensorrt_llm \
-nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc5 \
+nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc6 \
 /bin/bash
 ```