[bug] qwen2.5-3b smooth_quant量化后的模型，图模式zeros op bug

复现步骤：
1.量化模型：python -m lmdeploy lite smooth_quant --device npu qwen2.5-3b
2.执行推理：python -m lmdeploy serve api_server qwen2.5-3b-w8a8 --backend pytorch --device ascend，刚启动起来并无报错。

开始聊天则报错如下：
torch._dynamo.exc.BackendCompilerFailed: backend='atbgraph' raised:
TypeError: AtbOverrides.Zeros() missing 1 required positional argument: 'origin_size'

另外如果是原版模型执行推理，则没有报错：
python -m lmdeploy serve api_server qwen2.5-3b --backend pytorch --device ascend，可以正常聊天。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] qwen2.5-3b smooth_quant量化后的模型，图模式zeros op bug #216

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] qwen2.5-3b smooth_quant量化后的模型，图模式zeros op bug #216

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions