Skip to content
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>DeepSeek-V2.5 (236B)</li>
<li>DeepSeek-V3 (685B)</li>
<li>DeepSeek-V3.2 (685B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>StarCoder2 (3B - 15B)</li>
Expand Down
2 changes: 2 additions & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,8 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>DeepSeek-V2.5 (236B)</li>
<li>DeepSeek-V3 (685B)</li>
<li>DeepSeek-V3.2 (685B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>StarCoder2 (3B - 15B)</li>
Expand Down
2 changes: 2 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>DeepSeek-V2.5 (236B)</li>
<li>DeepSeek-V3 (685B)</li>
<li>DeepSeek-V3.2 (685B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>StarCoder2 (3B - 15B)</li>
Expand Down
2 changes: 2 additions & 0 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V3 | 685B | LLM | Yes | No | No | No | No |
| DeepSeek-V3.2 | 685B | LLM | Yes | No | No | No | No |
| DeepSeek-VL2 | 3B - 27B | MLLM | Yes | No | No | No | No |
| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
Expand Down
2 changes: 2 additions & 0 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V3 | 685B | LLM | Yes | No | No | No | No |
| DeepSeek-V3.2 | 685B | LLM | Yes | No | No | No | No |
| DeepSeek-VL2 | 3B - 27B | MLLM | Yes | No | No | No | No |
| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
Expand Down
3 changes: 3 additions & 0 deletions lmdeploy/pytorch/backends/attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ class AttentionMetadata:
q_seqlens: torch.Tensor = None
kv_seqlens: torch.Tensor = None
fill_seqlens: torch.Tensor = None
cu_seqlens_q: torch.Tensor = None
cu_seqlens_k: torch.Tensor = None
quant_policy: Literal[0, 4, 8] = 0


Expand Down Expand Up @@ -70,6 +72,7 @@ def forward(
k_scales_zeros: torch.Tensor = None,
v_scales_zeros: torch.Tensor = None,
learnable_sink: torch.Tensor = None,
nsa_indices: torch.Tensor = None,
inplace: bool = False,
) -> torch.Tensor:
"""forward."""
Expand Down
1 change: 1 addition & 0 deletions lmdeploy/pytorch/backends/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ class OpType(Enum):
FusedMoEW8A8 = auto()
LinearBlockedF8 = auto()
FusedMoEBlockedF8 = auto()
NSAIndexFP8 = auto()


class OpsBackend(ABC):
Expand Down
Loading