Skip to content

[QST] Scale factors and benchmarks #2

@jeromeku

Description

@jeromeku

Great paper and thanks for open sourcing the code.

A couple questions:

  1. Is the benchmarking code in section 4 of the paper available (GEMM, FastFP16toInt8)?
  2. In the per-group W4A8 kernel, why is there a need for an additional channel-wise scale factor in FusedDequantQuant? I.e., the Int4 weights are dequantized to FP16 using group-wise scale factors, then quantized to Int8 using an additional channel-wise scale then fed to Int8 GEMM. In contrast, in the channel-wise W4A8 kernel, the Int4 weights are directly converted to Int8 then fed to Int8 GEMM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions