[Feature]: support for fp8 marlin with MoE

### 🚀 The feature, motivation and pitch

I wanna run Qwen3-235B-A22B on Ampere (A100) in fp8.

I quantized it to w8a16 using llm-compressor

https://huggingface.co/cognitivecomputations/Qwen3-235B-A22B-FP8-W8A16

but when I run it, I get the error

ERROR 05-02 03:16:53 [multiproc_executor.py:435] AssertionError: float16 is required for MoE compressed models. Set dtype=torch.float16

Please support FP8 with MoE in Marlin kernel

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: support for fp8 marlin with MoE #17579

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: support for fp8 marlin with MoE #17579

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions