How to choose the quantization method #2592

camhpj · 2024-03-21T18:48:48Z

camhpj
Mar 21, 2024

I am trying to do int8 quantization on a pytorch model and am confused as to how to select the accuracy aware method and provide the max accuracy drop. Any help with this would be appreciated.

Answered by alexsu52

Mar 22, 2024

Hi @camhpj,

nncf.quantize_with_accuracy_control does not yet support the PyTorch model directly, instead you can export a PyTorch model to OpenVINO or ONNX and run nncf.quantize_with_accuracy_control on the exported model.

OpenVINO example: https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control
ONNX example: https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control

nncf.quantize_with_accuracy_control controls of accuracy metric by keeping the most impactful operations within the model in the original precision (OpenVINO documentation). …

View full answer

alexsu52 · 2024-03-22T07:49:38Z

alexsu52
Mar 22, 2024

Hi @camhpj,

nncf.quantize_with_accuracy_control does not yet support the PyTorch model directly, instead you can export a PyTorch model to OpenVINO or ONNX and run nncf.quantize_with_accuracy_control on the exported model.

OpenVINO example: https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control
ONNX example: https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control

nncf.quantize_with_accuracy_control controls of accuracy metric by keeping the most impactful operations within the model in the original precision (OpenVINO documentation). This means that some operation will be executed at the original precision and performance of the model will be lower than that of a fully quantized model. I would recommend to use quantization aware training (QAT) for PyTorch model. It allows you to get the best accuracy of quantized model via model fine-tuning without performance degradation. But this method requires a training pipeline: QAT example for PyTorch model

1 reply

camhpj Mar 22, 2024
Author

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to choose the quantization method #2592

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to choose the quantization method #2592

Uh oh!

camhpj Mar 21, 2024

Replies: 1 comment · 1 reply

Uh oh!

alexsu52 Mar 22, 2024

Uh oh!

camhpj Mar 22, 2024 Author

camhpj
Mar 21, 2024

Replies: 1 comment 1 reply

alexsu52
Mar 22, 2024

camhpj Mar 22, 2024
Author