|
| 1 | +# Expert Configuration of LLM API |
| 2 | + |
| 3 | +For advanced TensorRT-LLM users, the full set of `tensorrt_llm._torch.auto_deploy.llm_args.LlmArgs` is exposed. Use at your own risk. The argument list may diverge from the standard TRT-LLM argument list. |
| 4 | + |
| 5 | +- All configuration fields used by the AutoDeploy core pipeline, `InferenceOptimizer`, are exposed exclusively in `AutoDeployConfi`g in `tensorrt_llm._torch.auto_deploy.llm_args`. |
| 6 | + Please make sure to refer to those first. |
| 7 | +- For advanced users, the full set of `LlmArgs` in `tensorrt_llm._torch.auto_deploy.llm_args` can be used to configure the AutoDeploy `LLM` API, including runtime options. |
| 8 | +- Note that some fields in the full `LlmArgs` |
| 9 | + object are overlapping, duplicated, and/or _ignored_ in AutoDeploy, particularly arguments |
| 10 | + pertaining to configuring the model itself since AutoDeploy's model ingestion+optimize pipeline |
| 11 | + significantly differs from the default manual workflow in TensorRT-LLM. |
| 12 | +- However, with the proper care the full `LlmArgs` |
| 13 | + objects can be used to configure advanced runtime options in TensorRT-LLM. |
| 14 | +- Any valid field can be simply provided as keyword argument ("`**kwargs`") to the AutoDeploy `LLM` API. |
| 15 | + |
| 16 | +# Expert Configuration of `build_and_run_ad.py` |
| 17 | + |
| 18 | +For advanced users, `build_and_run_ad.py` provides advanced configuration capabilities using a flexible argument parser powered by PyDantic Settings and OmegaConf. You can use dot notation for CLI arguments, provide multiple YAML configuration files, and utilize sophisticated configuration precedence rules to create complex deployment configurations. |
| 19 | + |
| 20 | +## CLI Arguments with Dot Notation |
| 21 | + |
| 22 | +The script supports flexible CLI argument parsing using dot notation to modify nested configurations dynamically. You can target any field in both the `ExperimentConfig` in `examples/auto_deploy/build_and_run_ad.py` and nested `AutoDeployConfig` or `LlmArgs` objects in `tensorrt_llm._torch.auto_deploy.llm_args`: |
| 23 | + |
| 24 | +```bash |
| 25 | +# Configure model parameters |
| 26 | +# NOTE: config values like num_hidden_layers are automatically resolved into the appropriate nested |
| 27 | +# dict value ``{"args": {"model_kwargs": {"num_hidden_layers": 10}}}`` although not explicitly |
| 28 | +# specified as CLI arg |
| 29 | +python build_and_run_ad.py \ |
| 30 | + --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \ |
| 31 | + --args.model-kwargs.num-hidden-layers=10 \ |
| 32 | + --args.model-kwargs.hidden-size=2048 \ |
| 33 | + --args.tokenizer-kwargs.padding-side=left |
| 34 | + |
| 35 | +# Configure runtime and backend options |
| 36 | +python build_and_run_ad.py \ |
| 37 | + --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" \ |
| 38 | + --args.world-size=2 \ |
| 39 | + --args.compile-backend=torch-opt \ |
| 40 | + --args.attn-backend=flashinfer |
| 41 | + |
| 42 | +# Configure prompting and benchmarking |
| 43 | +python build_and_run_ad.py \ |
| 44 | + --model "microsoft/phi-4" \ |
| 45 | + --prompt.batch-size=4 \ |
| 46 | + --prompt.sp-kwargs.max-tokens=200 \ |
| 47 | + --prompt.sp-kwargs.temperature=0.7 \ |
| 48 | + --benchmark.enabled=true \ |
| 49 | + --benchmark.bs=8 \ |
| 50 | + --benchmark.isl=1024 |
| 51 | +``` |
| 52 | + |
| 53 | +## YAML Configuration Files |
| 54 | + |
| 55 | +Both `ExperimentConfig` and `AutoDeployConfig`/`LlmArgs` inherit from `DynamicYamlMixInForSettings`, which enables you to provide multiple YAML configuration files that are automatically deep-merged at runtime. |
| 56 | + |
| 57 | +Create a YAML configuration file (e.g., `my_config.yaml`): |
| 58 | + |
| 59 | +```yaml |
| 60 | +# my_config.yaml |
| 61 | +args: |
| 62 | + model_kwargs: |
| 63 | + num_hidden_layers: 12 |
| 64 | + hidden_size: 1024 |
| 65 | + world_size: 4 |
| 66 | + compile_backend: torch-compile |
| 67 | + attn_backend: triton |
| 68 | + max_seq_len: 2048 |
| 69 | + max_batch_size: 16 |
| 70 | + transforms: |
| 71 | + sharding: |
| 72 | + strategy: auto |
| 73 | + quantization: |
| 74 | + enabled: false |
| 75 | + |
| 76 | +prompt: |
| 77 | + batch_size: 8 |
| 78 | + sp_kwargs: |
| 79 | + max_tokens: 150 |
| 80 | + temperature: 0.8 |
| 81 | + top_k: 50 |
| 82 | + |
| 83 | +benchmark: |
| 84 | + enabled: true |
| 85 | + num: 20 |
| 86 | + bs: 4 |
| 87 | + isl: 1024 |
| 88 | + osl: 256 |
| 89 | +``` |
| 90 | +
|
| 91 | +Create an additional override file (e.g., `production.yaml`): |
| 92 | + |
| 93 | +```yaml |
| 94 | +# production.yaml |
| 95 | +args: |
| 96 | + world_size: 8 |
| 97 | + compile_backend: torch-opt |
| 98 | + max_batch_size: 32 |
| 99 | +
|
| 100 | +benchmark: |
| 101 | + enabled: false |
| 102 | +``` |
| 103 | + |
| 104 | +Then use these configurations: |
| 105 | + |
| 106 | +```bash |
| 107 | +# Using single YAML config |
| 108 | +python build_and_run_ad.py \ |
| 109 | + --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \ |
| 110 | + --yaml-configs my_config.yaml |
| 111 | +
|
| 112 | +# Using multiple YAML configs (deep merged in order, later files have higher priority) |
| 113 | +python build_and_run_ad.py \ |
| 114 | + --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \ |
| 115 | + --yaml-configs my_config.yaml production.yaml |
| 116 | +
|
| 117 | +# Targeting nested AutoDeployConfig with separate YAML |
| 118 | +python build_and_run_ad.py \ |
| 119 | + --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \ |
| 120 | + --yaml-configs my_config.yaml \ |
| 121 | + --args.yaml-configs autodeploy_overrides.yaml |
| 122 | +``` |
| 123 | + |
| 124 | +## Configuration Precedence and Deep Merging |
| 125 | + |
| 126 | +The configuration system follows a precedence order in which higher priority sources override lower priority ones: |
| 127 | + |
| 128 | +1. **CLI Arguments** (highest priority) - Direct command line arguments |
| 129 | +1. **YAML Configs** - Files specified via `--yaml-configs` and `--args.yaml-configs` |
| 130 | +1. **Default Settings** (lowest priority) - Built-in defaults from the config classes |
| 131 | + |
| 132 | +**Deep Merging**: Unlike simple overwriting, deep merging recursively combines nested dictionaries. For example: |
| 133 | + |
| 134 | +```yaml |
| 135 | +# Base config |
| 136 | +args: |
| 137 | + model_kwargs: |
| 138 | + num_hidden_layers: 10 |
| 139 | + hidden_size: 1024 |
| 140 | + max_seq_len: 2048 |
| 141 | +``` |
| 142 | + |
| 143 | +```yaml |
| 144 | +# Override config |
| 145 | +args: |
| 146 | + model_kwargs: |
| 147 | + hidden_size: 2048 # This will override |
| 148 | + # num_hidden_layers: 10 remains unchanged |
| 149 | + world_size: 4 # This gets added |
| 150 | +``` |
| 151 | + |
| 152 | +**Nested Config Behavior**: When using nested configurations, outer YAML configuration files become initialization settings for inner objects, giving them higher precedence: |
| 153 | + |
| 154 | +```bash |
| 155 | +# The outer yaml-configs affects the entire ExperimentConfig |
| 156 | +# The inner args.yaml-configs affects only the AutoDeployConfig |
| 157 | +python build_and_run_ad.py \ |
| 158 | + --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \ |
| 159 | + --yaml-configs experiment_config.yaml \ |
| 160 | + --args.yaml-configs autodeploy_config.yaml \ |
| 161 | + --args.world-size=8 # CLI override beats both YAML configs |
| 162 | +``` |
| 163 | + |
| 164 | +## Built-in Default Configuration |
| 165 | + |
| 166 | +Both `AutoDeployConfig` and `LlmArgs` classes automatically load a built-in `default.yaml` configuration file that provides defaults for the AutoDeploy inference optimizer pipeline. This file is specified in the `_get_config_dict()` function in `tensorrt_llm._torch.auto_deploy.llm_args` and defines default transform configurations for graph optimization stages. |
| 167 | + |
| 168 | +The built-in defaults are automatically merged with your configurations at the lowest priority level, ensuring that your custom settings always override the defaults. You can inspect the current default configuration to understand the baseline transform pipeline: |
| 169 | + |
| 170 | +```bash |
| 171 | +# View the default configuration |
| 172 | +cat tensorrt_llm/_torch/auto_deploy/config/default.yaml |
| 173 | +
|
| 174 | +# Override specific transform settings |
| 175 | +python build_and_run_ad.py \ |
| 176 | + --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" \ |
| 177 | + --args.transforms.export-to-gm.strict=true |
| 178 | +``` |
0 commit comments