Skip to content

Commit 80dfcb5

Browse files
committed
update contents
Signed-off-by: h-guo18 <[email protected]>
1 parent c1e59fd commit 80dfcb5

File tree

2 files changed

+23
-5
lines changed

2 files changed

+23
-5
lines changed

docs/source/torch/auto_deploy/auto-deploy.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,5 +63,19 @@ The exported graph then undergoes a series of automated transformations, includi
6363

6464
## Roadmap
6565

66-
Check out our [Github Project Board](https://github.com/orgs/NVIDIA/projects/83) to learn more about
67-
the current progress in AutoDeploy and where you can help.
66+
We are actively expanding AutoDeploy to support a broader range of model architectures and inference features.
67+
68+
**Upcoming Model Support:**
69+
70+
- Vision-Language Models (VLMs)
71+
72+
- Structured State Space Models (SSMs) and Linear Attention architectures
73+
74+
**Planned Features:**
75+
76+
- Low-Rank Adaptation (LoRA)
77+
78+
- Speculative Decoding for accelerated generation
79+
80+
To track development progress and contribute, visit our [Github Project Board](https://github.com/orgs/NVIDIA/projects/83).
81+
We welcome community contributions, see our [`CONTRIBUTING.md`](../../../../../examples/auto_deploy/CONTRIBUTING.md) for guidelines.

docs/source/torch/auto_deploy/support_matrix.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ The exported graph then undergoes a series of automated transformations, includi
88

99
**Bring Your Own Model**: AutoDeploy leverages `torch.export` and dynamic graph pattern matching, enabling seamless integration for a wide variety of models without relying on hard-coded architectures.
1010

11+
We support Hugging Face models that are compatible with `AutoModelForCausalLM` and `AutoModelForImageTextToText`.
1112
Additionally, we have officially verified support for the following models:
1213

1314
<details>
@@ -60,7 +61,10 @@ Optimize attention operations using different attention kernel implementations:
6061

6162
### Precision Support
6263

63-
AutoDeploy supports a range of precision formats to enhance model performance, including:
64+
AutoDeploy supports models with various precision formats, including quantized checkpoints generated by [`TensorRT-Model-Optimizer`](https://github.com/NVIDIA/TensorRT-Model-Optimizer).
6465

65-
- BF16, FP32
66-
- Quantization formats like FP8.
66+
**Supported precision types include:**
67+
68+
- BF16 / FP16 / FP32
69+
- FP8
70+
- [NVFP4](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/)

0 commit comments

Comments
 (0)