Skip to content

Conversation

@Kaihui-intel
Copy link
Contributor

@Kaihui-intel Kaihui-intel commented Jun 4, 2024

Type of Change

bug fix

Description

solution: use numpy for pack_tensor/unpack_tensor
As we found that cuda is more quick than cpu in some cases, we use below behavior as default.

def pack_tensor():
  if 'cuda' in self.device:  # may be xpu also needs it
      pack_tensor_with_torch()
  else:
      pack_tensor_with_numpy()

local test on Xeon(R) 6248
code: https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch/text-generation/quantization

cmd: python run_generation_cpu_woq.py --model Phi-3-mini-4k-instruct --woq --woq_algo Rtn 2>&1 | tee test_np.log

original result

2024-05-29 10:27:16 [INFO] Pass quantize model elapsed time: 1050.21 ms
2024-05-29 10:27:16 [INFO] Save tuning history to /home2/kaihuita/code/intel-extension-for-transformers/examples/huggingface/pytorch/text-generation/quantization/nc_workspace/2024-05-29_10-27-08/./history.snapshot.
2024-05-29 10:27:16 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process.
2024-05-29 10:27:16 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-05-29 10:27:16 [INFO] Save deploy yaml to /home2/kaihuita/code/intel-extension-for-transformers/examples/huggingface/pytorch/text-generation/quantization/nc_workspace/2024-05-29_10-27-08/deploy.yaml
2024-05-29 10:31:28 [INFO] WeightOnlyQuant done.
2024-05-29 10:36:21 [INFO] Configuration saved in ./saved_results/quantize_config.json

numpy result (this PR)

2024-06-04 11:28:33 [INFO] Pass quantize model elapsed time: 1242.06 ms
2024-06-04 11:28:33 [INFO] Save tuning history to /home2/kaihuita/code/intel-extension-for-transformers/examples/huggingface/pytorch/text-generation/quantization/nc_workspace/2024-06-04_11-27-20/./history.snapshot.
2024-06-04 11:28:33 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process.
2024-06-04 11:28:33 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-06-04 11:28:33 [INFO] Save deploy yaml to /home2/kaihuita/code/intel-extension-for-transformers/examples/huggingface/pytorch/text-generation/quantization/nc_workspace/2024-06-04_11-27-20/deploy.yaml
2024-06-04 11:29:46 [INFO] WeightOnlyQuant done.
2024-06-04 11:31:42 [INFO] Configuration saved in ./saved_results/quantize_config.json

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
@Kaihui-intel Kaihui-intel requested a review from xin3he June 4, 2024 03:42
Signed-off-by: Kaihui-intel <[email protected]>
@github-actions
Copy link

github-actions bot commented Jun 4, 2024

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the Probot, please contact XuehaoSun for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Code Scan Tests workflow
Check ID Status Error details
Code-Scan success
Code-Scan (Bandit Code Scan Bandit) success
Code-Scan (DocStyle Code Scan DocStyle) success
Code-Scan (Pylint Code Scan Pylint) success

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.

🟢 Model Tests workflow
Check ID Status Error details
Model-Test success
Model-Test (Generate Report GenerateReport) success
Model-Test (Run ONNX Model resnet50-v1-12) success
Model-Test (Run PyTorch Model resnet18) success
Model-Test (Run PyTorch Model resnet18_fx) success
Model-Test (Run TensorFlow Model darknet19) success
Model-Test (Run TensorFlow Model inception_v1) success
Model-Test (Run TensorFlow Model resnet-101) success
Model-Test (Run TensorFlow Model resnet50v1.5) success
Model-Test (Run TensorFlow Model ssd_mobilenet_v1_ckpt) success
Model-Test (Run TensorFlow Model ssd_resnet50_v1) success

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.

🟢 Unit Tests basic workflow
Check ID Status Error details
UT-Basic success
UT-Basic (Coverage Compare CollectDatafiles) success
UT-Basic (Unit Test FWKs adaptor Test FWKs adaptor) success
UT-Basic (Unit Test FWKs adaptor baseline Test FWKs adaptor baseline) success
UT-Basic (Unit Test ITEX Test ITEX) success
UT-Basic (Unit Test ITEX baseline Test ITEX baseline) success
UT-Basic (Unit Test Pruning Test PyTorch Pruning) success
UT-Basic (Unit Test Pruning Test TensorFlow Pruning) success
UT-Basic (Unit Test Pruning baseline Test PyTorch Pruning baseline) success
UT-Basic (Unit Test Pruning baseline Test TensorFlow Pruning baseline) success
UT-Basic (Unit Test TF newAPI Test TF newAPI) success
UT-Basic (Unit Test TF newAPI baseline Test TF newAPI baseline) success
UT-Basic (Unit Test User facing API Test User facing API) success
UT-Basic (Unit Test User facing API baseline Test User facing API baseline) success
UT-Basic (Unit Test other basic case Test other basic case) success
UT-Basic (Unit Test other cases baseline Test other cases baseline) success
UT-Basic coverage report
Base coverage PR coverage Diff
Lines 86.638% 86.793% 0.155%
Branches 76.191% 76.458% 0.267%

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.

🟢 Unit Tests basic no coverage workflow
Check ID Status Error details
UT-Basic-No-Coverage success
UT-Basic-No-Coverage (Unit Test FWKs adaptor Test FWKs adaptor) success
UT-Basic-No-Coverage (Unit Test Pruning Test PyTorch Pruning) success
UT-Basic-No-Coverage (Unit Test Pruning Test TensorFlow Pruning) success
UT-Basic-No-Coverage (Unit Test User facing API Test User facing API) success
UT-Basic-No-Coverage (Unit Test other basic case Test other basic case) success

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.

🔴 Unit Tests ITREX workflow
Check ID Status Error details
UT-ITREX failure download

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact chensuyue or XuehaoSun for help.

@chensuyue chensuyue merged commit daa1431 into master Jun 5, 2024
@chensuyue chensuyue deleted the kaihui/pack_2x branch June 5, 2024 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants