-
Notifications
You must be signed in to change notification settings - Fork 16
Closed
Description
Great paper and thanks for open sourcing the code.
A couple questions:
- Is the benchmarking code in section 4 of the paper available (
GEMM
,FastFP16toInt8
)? - In the per-group
W4A8
kernel, why is there a need for an additional channel-wise scale factor inFusedDequantQuant
? I.e., theInt4
weights are dequantized toFP16
using group-wise scale factors, then quantized toInt8
using an additional channel-wise scale then fed toInt8
GEMM. In contrast, in the channel-wiseW4A8
kernel, theInt4
weights are directly converted toInt8
then fed toInt8
GEMM.
Metadata
Metadata
Assignees
Labels
No labels