Skip to content

Commit 801e81e

Browse files
committed
[TRTLLM-5930][doc] 1.0 Documentation.
Signed-off-by: nv-guomingz <[email protected]>
1 parent 09038be commit 801e81e

34 files changed

+3452
-122
lines changed

docs/source/commands/trtllm-eval.rst

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
trtllm-eval
2+
===========
3+
4+
About
5+
-----
6+
7+
The ``trtllm-eval`` command provides developers with a unified entry point for accuracy evaluation. It shares the core evaluation logic with the `accuracy test suite <https://github.com/NVIDIA/TensorRT-LLM/tree/main/tests/integration/defs/accuracy>`_ of TensorRT-LLM.
8+
9+
``trtllm-eval`` is built on the offline API -- LLM API. Compared to the online ``trtllm-serve``, the offline API provides clearer error messages and simplifies the debugging workflow.
10+
11+
The following tasks are currently supported:
12+
13+
.. list-table::
14+
:header-rows: 1
15+
:widths: 20 25 15 15 15
16+
17+
* - Dataset
18+
- Task
19+
- Metric
20+
- Default ISL
21+
- Default OSL
22+
* - CNN Dailymail
23+
- summarization
24+
- rouge
25+
- 924
26+
- 100
27+
* - MMLU
28+
- QA; multiple choice
29+
- accuracy
30+
- 4,094
31+
- 2
32+
* - GSM8K
33+
- QA; regex matching
34+
- accuracy
35+
- 4,096
36+
- 256
37+
* - GPQA
38+
- QA; multiple choice
39+
- accuracy
40+
- 32,768
41+
- 4,096
42+
* - JSON mode eval
43+
- structured generation
44+
- accuracy
45+
- 1,024
46+
- 512
47+
48+
49+
Usage and Examples
50+
------------------
51+
52+
Some evaluation tasks (e.g., GSM8K and GPQA) depend on the ``lm_eval`` package. To run these tasks, you need to install ``lm_eval`` with:
53+
54+
.. code-block:: bash
55+
56+
pip install -r requirements-dev.txt
57+
58+
Alternatively, you can install the ``lm_eval`` version specified in ``requirements-dev.txt``.
59+
60+
Here are some examples:
61+
62+
.. code-block:: bash
63+
64+
# Evaluate Llama-3.1-8B-Instruct on MMLU
65+
trtllm-eval --model meta-llama/Llama-3.1-8B-Instruct mmlu
66+
67+
# Evaluate Llama-3.1-8B-Instruct on GSM8K
68+
trtllm-eval --model meta-llama/Llama-3.1-8B-Instruct gsm8k
69+
70+
# Evaluate Llama-3.3-70B-Instruct on GPQA Diamond
71+
trtllm-eval --model meta-llama/Llama-3.3-70B-Instruct gpqa_diamond
72+
73+
The ``--model`` argument accepts either a Hugging Face model ID or a local checkpoint path. By default, ``trtllm-eval`` runs the model with the PyTorch backend; you can pass ``--backend tensorrt`` to switch to the TensorRT backend.
74+
75+
Alternatively, the ``--model`` argument also accepts a local path to pre-built TensorRT engines. In this case, you should pass the Hugging Face tokenizer path to the ``--tokenizer`` argument.
76+
77+
For more details, see ``trtllm-eval --help`` and ``trtllm-eval <task> --help``.
78+
79+
80+
81+
Syntax
82+
------
83+
84+
.. click:: tensorrt_llm.commands.eval:main
85+
:prog: trtllm-eval
86+
:nested: full

docs/source/conf.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,18 @@
105105
container published for a previous
106106
[GitHub pre-release or release](https://github.com/NVIDIA/TensorRT-LLM/releases)
107107
(see also [NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags)).
108+
```
109+
""",
110+
"trtllm_serve_tag_admonition":
111+
r"""
112+
```{admonition} trtllm-serve requests
113+
:class: dropdown note
114+
If you are running trtllm-server inside a Docker container, you have two options for sending API requests:
115+
1. Expose port 8000 to access the server from outside the container.
116+
2. Open a new terminal and use the following command to directly attach to the running container:
117+
```bash
118+
docker exec -it <container_id> bash
119+
108120
```
109121
""",
110122
}
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Model Recipes
2+
================
3+
4+
.. toctree::
5+
:maxdepth: 1
6+
:caption: Model Recipes
7+
:name: Model Recipes
8+
9+
quick-start-recipe-for-deepseek-r1-on-trtllm.md
10+
quick-start-recipe-for-llama3.3-70b-on-trtllm.md
11+
quick-start-recipe-for-llama4-scout-on-trtllm.md

examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md renamed to docs/source/deployment-guide/quick-start-recipe-for-deepseek-r1-on-trtllm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ docker run --rm -it \
3535
-p 8000:8000 \
3636
-v ~/.cache:/root/.cache:rw \
3737
--name tensorrt_llm \
38-
nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc5 \
38+
nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc6 \
3939
/bin/bash
4040
```
4141

0 commit comments

Comments
 (0)