[REQUEST]: request group scaled fp8 models for qwen3

### Has this been supported or requested before?

- [x] I have checked [the GitHub README](https://github.com/QwenLM/Qwen3).
- [x] I have checked [the Qwen documentation](https://qwen.readthedocs.io).
- [x] I have checked the documentation of the related framework.
- [x] I have searched [the issues](https://github.com/QwenLM/Qwen3/issues?q=is%3Aissue) and there is not a similar one.

### What is this feature about?

a new quantized model

### Proposal

#### Introduction

I would like that ...

#### Rational

Implementation of this feature will help the following usecase:
in this https://github.com/vllm-project/vllm/pull/17280 they said they supported fp8 on sm120, and they use this model for test https://huggingface.co/RedHatAI/Qwen2.5-3B-FP8-dynamic. but few days later someone says Qwen3-14B-FP8( https://huggingface.co/Qwen/Qwen3-14B-FP8 ) can't run on rtx pro 6000, after a few studies I found vllm only support group scaled fp8 but not block scaled fp8. but unfortunately all fp8 model provided by Qwen is block scaled. so I hope Qwen will provide group scaled models for deployment on sm120. thanks a lot!!!




### Contributions are welcomed

- [ ] I am willing to help implement this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REQUEST]: request group scaled fp8 models for qwen3 #1586

Has this been supported or requested before?

What is this feature about?

Proposal

Introduction

Rational

Contributions are welcomed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[REQUEST]: request group scaled fp8 models for qwen3 #1586

Description

Has this been supported or requested before?

What is this feature about?

Proposal

Introduction

Rational

Contributions are welcomed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions