update contents

h-guo18 · h-guo18 · commit 80dfcb5742c4 · 2025-07-31T23:45:44.000Z
Signed-off-by: h-guo18 &lt;67671475+h-guo18@users.noreply.github.com&gt;
diff --git a/docs/source/torch/auto_deploy/auto-deploy.md b/docs/source/torch/auto_deploy/auto-deploy.md
@@ -63,5 +63,19 @@ The exported graph then undergoes a series of automated transformations, includi
 
 ## Roadmap
 
-Check out our [Github Project Board](https://github.com/orgs/NVIDIA/projects/83) to learn more about
-the current progress in AutoDeploy and where you can help.
+We are actively expanding AutoDeploy to support a broader range of model architectures and inference features.
+
+**Upcoming Model Support:**
+
+- Vision-Language Models (VLMs)
+
+- Structured State Space Models (SSMs) and Linear Attention architectures
+
+**Planned Features:**
+
+- Low-Rank Adaptation (LoRA)
+
+- Speculative Decoding for accelerated generation
+
+To track development progress and contribute, visit our [Github Project Board](https://github.com/orgs/NVIDIA/projects/83).
+We welcome community contributions, see our [`CONTRIBUTING.md`](../../../../../examples/auto_deploy/CONTRIBUTING.md) for guidelines.
diff --git a/docs/source/torch/auto_deploy/support_matrix.md b/docs/source/torch/auto_deploy/support_matrix.md
@@ -8,6 +8,7 @@ The exported graph then undergoes a series of automated transformations, includi
 
 **Bring Your Own Model**: AutoDeploy leverages `torch.export` and dynamic graph pattern matching, enabling seamless integration for a wide variety of models without relying on hard-coded architectures.
 
+We support Hugging Face models that are compatible with `AutoModelForCausalLM` and `AutoModelForImageTextToText`.
 Additionally, we have officially verified support for the following models:
 
 <details>
@@ -60,7 +61,10 @@ Optimize attention operations using different attention kernel implementations:
 
 ### Precision Support
 
-AutoDeploy supports a range of precision formats to enhance model performance, including:
+AutoDeploy supports models with various precision formats, including quantized checkpoints generated by [`TensorRT-Model-Optimizer`](https://github.com/NVIDIA/TensorRT-Model-Optimizer).
 
-- BF16, FP32
-- Quantization formats like FP8.
+**Supported precision types include:**
+
+- BF16 / FP16 / FP32
+- FP8
+- [NVFP4](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/)