Difference between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear


for example here:
https://github.com/AniZpZ/AutoSmoothQuant/blob/main/autosmoothquant/models/llama.py#L89
```python
int8_module.q_proj = W8A8BFP32OFP32Linear.from_float(module.q_proj, attn_input_scale, 
int8_module.o_proj = W8A8BFP32OFP32LinearWithQuantScale.from_float(
            module.o_proj, out_input_scale, act_quant=int8_module.o_quant_type)
```
Is the difference whether it involves`quant_scale` or not?

`quant_scale` is for activition `x` and `dequant_scale` is for `weight`, right ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Difference between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Difference between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions