Add models and features supporting matrix.

QiliangCui2023 · QiliangCui2023 · commit a5e4c4669569 · 2025-06-29T21:10:39.000Z
Signed-off-by: Qiliang Cui &lt;cuiq@google.com&gt;
diff --git a/docs/.nav.yml b/docs/.nav.yml
@@ -39,6 +39,7 @@ nav:
       - models/generative_models.md
       - models/pooling_models.md
       - models/extensions
+      - Hardware Supported Models: models/hardware_supported_models
     - Features:
       - features/compatibility_matrix.md
       - features/*
diff --git a/docs/features/compatibility_matrix.md b/docs/features/compatibility_matrix.md
@@ -59,23 +59,23 @@ th:not(:first-child) {
 
 ## Feature x Hardware
 
-| Feature                                                   | Volta              | Turing   | Ampere   | Ada   | Hopper   | CPU                | AMD   |
-|-----------------------------------------------------------|--------------------|----------|----------|-------|----------|--------------------|-------|
-| [CP][chunked-prefill]                                     | [❌](gh-issue:2729) | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |
-| [APC][automatic-prefix-caching]                           | [❌](gh-issue:3687) | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |
-| [LoRA][lora-adapter]                                      | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |
-| <abbr title="Prompt Adapter">prmpt adptr</abbr>           | ✅                  | ✅        | ✅        | ✅     | ✅        | [❌](gh-issue:8475) | ✅     |
-| [SD][spec-decode]                                         | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |
-| CUDA graph                                                | ✅                  | ✅        | ✅        | ✅     | ✅        | ❌                  | ✅     |
-| <abbr title="Pooling Models">pooling</abbr>               | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ❔     |
-| <abbr title="Encoder-Decoder Models">enc-dec</abbr>       | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ❌     |
-| <abbr title="Multimodal Inputs">mm</abbr>                 | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |
-| <abbr title="Logprobs">logP</abbr>                        | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |
-| <abbr title="Prompt Logprobs">prmpt logP</abbr>           | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |
-| <abbr title="Async Output Processing">async output</abbr> | ✅                  | ✅        | ✅        | ✅     | ✅        | ❌                  | ❌     |
-| multi-step                                                | ✅                  | ✅        | ✅        | ✅     | ✅        | [❌](gh-issue:8477) | ✅     |
-| best-of                                                   | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |
-| beam-search                                               | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |
+| Feature                                                   | Volta               | Turing    | Ampere    | Ada    | Hopper     | CPU                | AMD    | TPU |
+|-----------------------------------------------------------|---------------------|-----------|-----------|--------|------------|--------------------|--------|-----|
+| [CP][chunked-prefill]                                     | [❌](gh-issue:2729) | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |✅  |
+| [APC][automatic-prefix-caching]                           | [❌](gh-issue:3687) | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |✅  |
+| [LoRA][lora-adapter]                                      | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |✅  |
+| <abbr title="Prompt Adapter">prmpt adptr</abbr>           | ✅                  | ✅        | ✅        | ✅     | ✅        | [❌](gh-issue:8475) | ✅     |✅  |
+| [SD][spec-decode]                                         | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |✅  |
+| CUDA graph                                                | ✅                  | ✅        | ✅        | ✅     | ✅        | ❌                  | ✅     |✅  |
+| <abbr title="Pooling Models">pooling</abbr>               | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ❔     |❌  |
+| <abbr title="Encoder-Decoder Models">enc-dec</abbr>       | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ❌     |❌  |
+| <abbr title="Multimodal Inputs">mm</abbr>                 | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |❌  |
+| <abbr title="Logprobs">logP</abbr>                        | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |❌  |
+| <abbr title="Prompt Logprobs">prmpt logP</abbr>           | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |❌  |
+| <abbr title="Async Output Processing">async output</abbr> | ✅                  | ✅        | ✅        | ✅     | ✅        | ❌                  | ❌     |✅  |
+| multi-step                                                | ✅                  | ✅        | ✅        | ✅     | ✅        | [❌](gh-issue:8477) | ✅     |✅  |
+| best-of                                                   | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |✅  |
+| beam-search                                               | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     |✅  |
 
 !!! note
     Please refer to [Feature support through NxD Inference backend][feature-support-through-nxd-inference-backend] for features supported on AWS Neuron hardware
diff --git a/docs/models/hardware_supported_models/tpu.md b/docs/models/hardware_supported_models/tpu.md
@@ -0,0 +1,32 @@
+---
+title: TPU
+---
+[](){ #tpu-supported-models }
+
+# TPU Supported Models
+## Text-only Language Models
+
+| Model | Supported |
+|-------|-----------|
+| mistralai/Mixtral-8x7B-Instruct-v0.1 | 🟨 |
+| mistralai/Mistral-Small-24B-Instruct-2501 | ✅ |
+| mistralai/Codestral-22B-v0.1 | ✅ |
+| mistralai/Mixtral-8x22B-Instruct-v0.1 | ❌ |
+| meta-llama/Llama-3.3-70B-Instruct | ✅ |
+| meta-llama/Llama-3.1-8B-Instruct | 🟨 |
+| meta-llama/Llama-3.1-70B-Instruct | ✅ |
+| meta-llama/Llama-4-* | ❌ |
+| microsoft/Phi-3-mini-128k-instruct | 🟨 |
+| microsoft/phi-4 | ❌ |
+| google/gemma-3-27b-it | 🟨 |
+| google/gemma-3-4b-it | ❌ |
+| deepseek-ai/DeepSeek-R1 | ❌ |
+| deepseek-ai/DeepSeek-V3 | ❌ |
+| RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 | 🟨 |
+| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 | ✅ |
+| Qwen/Qwen3-8B | 🟨 |
+| Qwen/Qwen3-32B | ✅ |
+| Qwen/Qwen2.5-7B-Instruct | ✅ |
+| Qwen/Qwen2.5-32B | ✅ |
+| Qwen/Qwen2.5-14B-Instruct | ✅ |
+| Qwen/Qwen2.5-1.5B-Instruct | 🟨 |