Skip to content

Difference between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear #10

@Hongbosherlock

Description

@Hongbosherlock

for example here:
https://github.com/AniZpZ/AutoSmoothQuant/blob/main/autosmoothquant/models/llama.py#L89

int8_module.q_proj = W8A8BFP32OFP32Linear.from_float(module.q_proj, attn_input_scale, 
int8_module.o_proj = W8A8BFP32OFP32LinearWithQuantScale.from_float(
            module.o_proj, out_input_scale, act_quant=int8_module.o_quant_type)

Is the difference whether it involvesquant_scale or not?

quant_scale is for activition x and dequant_scale is for weight, right ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions