model size of INT8 #3659

qiuzhewei · 2025-09-15T09:24:04Z

qiuzhewei
Sep 15, 2025

Hi,
I am trying to convert model to int8 following https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/onnx/mobilenet_v2/main.py
The original model size on disk is 140M. But the converted model in INT8 mode is also 141M. I suppose that should be 1/4 of model in FP32. Below is the script I use to convert the model.

import onnx
import torch
from torchvision import datasets
from torchvision import transforms

import nncf


val_dataset = datasets.ImageFolder(
    root = "/mnt/data/qiuzhewei.qzw/data/quality/images/train_data_face/model_conversion",
    transform = transforms.Compose([
    transforms.Resize([224,224]),
    transforms.ToTensor(),
    transforms.Normalize(mean = (0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
    ])
)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, shuffle=False)

onnx_path = "onnx_models/mbv4_l_0709_224_00002_20.onnx"
model = onnx_model = onnx.load(onnx_path)
input_name = onnx_model.graph.input[0].name

def transform_fn(data_item):
    images, _ = data_item
    return {input_name: images.numpy()}


calibration_dataset = nncf.Dataset(val_loader, transform_fn)
onnx_quantized_model = nncf.quantize(model, calibration_dataset)

int8_model_path = "model_int8.onnx"
onnx.save(onnx_quantized_model, int8_model_path)

MaximProshin · 2025-09-15T14:20:26Z

MaximProshin
Sep 15, 2025
Maintainer

Hi @qiuzhewei . Thanks for your question! Currently NNCF doesn't save weights as int8 after the quantization of an ONNX model (unlike Torch or OV). We're working on it and hopefully it will be ready by next NNCF release.

0 replies

qiuzhewei · 2025-09-22T02:31:24Z

qiuzhewei
Sep 22, 2025
Author

@MaximProshin Thanks for reply! The problem persists when I am trying to convert quantized model using torch. The quantized_model.pth is also 141M

print("----------------------------- Compressed result -----------------------------")
calibration_dataset = nncf.Dataset(train_loader, transform_fn)
quantized_model = nncf.quantize(model, calibration_dataset, subset_size=600)
quantized_model.eval()
with torch.no_grad():
    for i, (images, target) in enumerate(train_loader):
        images = images.to(torch_device)
        target = target.to(torch_device)
        output = quantized_model(images)
        result_int8 = torch.sigmoid(torch.cat(output, dim=1)).cpu().numpy()
        print(result_int8)
        break
    
print(np.max(result_fp32 - result_int8))

torch.save(quantized_model.state_dict(), "quantized_model.pth")

0 replies

MaximProshin · 2025-09-26T06:00:29Z

MaximProshin
Sep 26, 2025
Maintainer

This PR addresses the issue with INT8 ONNX model size: #3662

1 reply

qiuzhewei Sep 28, 2025
Author

Do I need to rebuild nncf after switching to #3662?
And is there any script to get INT8 ONNX model? Or just simply save onnx model using

int8_model_path = "model_int8.onnx"
onnx.save(onnx_quantized_model, int8_model_path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model size of INT8 #3659

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

model size of INT8 #3659

Uh oh!

Uh oh!

qiuzhewei Sep 15, 2025

Replies: 3 comments · 1 reply

Uh oh!

Uh oh!

MaximProshin Sep 15, 2025 Maintainer

Uh oh!

qiuzhewei Sep 22, 2025 Author

Uh oh!

MaximProshin Sep 26, 2025 Maintainer

Uh oh!

qiuzhewei Sep 28, 2025 Author

qiuzhewei
Sep 15, 2025

Replies: 3 comments 1 reply

MaximProshin
Sep 15, 2025
Maintainer

qiuzhewei
Sep 22, 2025
Author

MaximProshin
Sep 26, 2025
Maintainer

qiuzhewei Sep 28, 2025
Author