Fix WOQ Linear pack/unpack slow issue 2x #1837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

chensuyue merged 6 commits into master from kaihui/pack_2x

Jun 5, 2024

Contributor

Kaihui-intel commented Jun 4, 2024 •

edited

Loading

Type of Change

bug fix

Description

solution: use numpy for pack_tensor/unpack_tensor
As we found that cuda is more quick than cpu in some cases, we use below behavior as default.

def pack_tensor():
  if 'cuda' in self.device:  # may be xpu also needs it
      pack_tensor_with_torch()
  else:
      pack_tensor_with_numpy()

local test on Xeon(R) 6248
code: https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch/text-generation/quantization

cmd: python run_generation_cpu_woq.py --model Phi-3-mini-4k-instruct --woq --woq_algo Rtn 2>&1 | tee test_np.log

original result

2024-05-29 10:27:16 [INFO] Pass quantize model elapsed time: 1050.21 ms
2024-05-29 10:27:16 [INFO] Save tuning history to /home2/kaihuita/code/intel-extension-for-transformers/examples/huggingface/pytorch/text-generation/quantization/nc_workspace/2024-05-29_10-27-08/./history.snapshot.
2024-05-29 10:27:16 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process.
2024-05-29 10:27:16 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-05-29 10:27:16 [INFO] Save deploy yaml to /home2/kaihuita/code/intel-extension-for-transformers/examples/huggingface/pytorch/text-generation/quantization/nc_workspace/2024-05-29_10-27-08/deploy.yaml
2024-05-29 10:31:28 [INFO] WeightOnlyQuant done.
2024-05-29 10:36:21 [INFO] Configuration saved in ./saved_results/quantize_config.json

numpy result (this PR)

2024-06-04 11:28:33 [INFO] Pass quantize model elapsed time: 1242.06 ms
2024-06-04 11:28:33 [INFO] Save tuning history to /home2/kaihuita/code/intel-extension-for-transformers/examples/huggingface/pytorch/text-generation/quantization/nc_workspace/2024-06-04_11-27-20/./history.snapshot.
2024-06-04 11:28:33 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process.
2024-06-04 11:28:33 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-06-04 11:28:33 [INFO] Save deploy yaml to /home2/kaihuita/code/intel-extension-for-transformers/examples/huggingface/pytorch/text-generation/quantization/nc_workspace/2024-06-04_11-27-20/deploy.yaml
2024-06-04 11:29:46 [INFO] WeightOnlyQuant done.
2024-06-04 11:31:42 [INFO] Configuration saved in ./saved_results/quantize_config.json

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Kaihui-intel added 2 commits

June 3, 2024 17:18


          add pack with numpy

Signed-off-by: Kaihui-intel <[email protected]>


          update recover

5ac0637

Signed-off-by: Kaihui-intel <[email protected]>

Kaihui-intel requested a review from xin3he

June 4, 2024 03:42


          minor update

58c82bc

Signed-off-by: Kaihui-intel <[email protected]>

github-actions bot commented Jun 4, 2024 •

edited

Loading

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the Probot, please contact XuehaoSun for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Code Scan Tests workflow

Check ID	Status	Error details
Code-Scan	success		✅
Code-Scan (Bandit Code Scan Bandit)	success		✅
Code-Scan (DocStyle Code Scan DocStyle)	success		✅
Code-Scan (Pylint Code Scan Pylint)	success		✅

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.

🟢 Model Tests workflow

Check ID	Status	Error details
Model-Test	success		✅
Model-Test (Generate Report GenerateReport)	success		✅
Model-Test (Run ONNX Model resnet50-v1-12)	success		✅
Model-Test (Run PyTorch Model resnet18)	success		✅
Model-Test (Run PyTorch Model resnet18_fx)	success		✅
Model-Test (Run TensorFlow Model darknet19)	success		✅
Model-Test (Run TensorFlow Model inception_v1)	success		✅
Model-Test (Run TensorFlow Model resnet-101)	success		✅
Model-Test (Run TensorFlow Model resnet50v1.5)	success		✅
Model-Test (Run TensorFlow Model ssd_mobilenet_v1_ckpt)	success		✅
Model-Test (Run TensorFlow Model ssd_resnet50_v1)	success		✅

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.

🟢 Unit Tests basic workflow

Check ID	Status	Error details
UT-Basic	success		✅
UT-Basic (Coverage Compare CollectDatafiles)	success		✅
UT-Basic (Unit Test FWKs adaptor Test FWKs adaptor)	success		✅
UT-Basic (Unit Test FWKs adaptor baseline Test FWKs adaptor baseline)	success		✅
UT-Basic (Unit Test ITEX Test ITEX)	success		✅
UT-Basic (Unit Test ITEX baseline Test ITEX baseline)	success		✅
UT-Basic (Unit Test Pruning Test PyTorch Pruning)	success		✅
UT-Basic (Unit Test Pruning Test TensorFlow Pruning)	success		✅
UT-Basic (Unit Test Pruning baseline Test PyTorch Pruning baseline)	success		✅
UT-Basic (Unit Test Pruning baseline Test TensorFlow Pruning baseline)	success		✅
UT-Basic (Unit Test TF newAPI Test TF newAPI)	success		✅
UT-Basic (Unit Test TF newAPI baseline Test TF newAPI baseline)	success		✅
UT-Basic (Unit Test User facing API Test User facing API)	success		✅
UT-Basic (Unit Test User facing API baseline Test User facing API baseline)	success		✅
UT-Basic (Unit Test other basic case Test other basic case)	success		✅
UT-Basic (Unit Test other cases baseline Test other cases baseline)	success		✅

UT-Basic coverage report

	Base coverage	PR coverage	Diff
Lines	86.638%	86.793%	0.155%
Branches	76.191%	76.458%	0.267%

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.

🟢 Unit Tests basic no coverage workflow

Check ID	Status	Error details
UT-Basic-No-Coverage	success		✅
UT-Basic-No-Coverage (Unit Test FWKs adaptor Test FWKs adaptor)	success		✅
UT-Basic-No-Coverage (Unit Test Pruning Test PyTorch Pruning)	success		✅
UT-Basic-No-Coverage (Unit Test Pruning Test TensorFlow Pruning)	success		✅
UT-Basic-No-Coverage (Unit Test User facing API Test User facing API)	success		✅
UT-Basic-No-Coverage (Unit Test other basic case Test other basic case)	success		✅

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.

🔴 Unit Tests ITREX workflow

Check ID	Status	Error details
UT-ITREX	failure	download	❌

These checks are required after the changes to neural_compressor/adaptor/torch_utils/model_wrapper.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact chensuyue or XuehaoSun for help.

pre-commit-ci bot and others added 3 commits

June 4, 2024 03:44


          [pre-commit.ci] auto fixes from pre-commit.com hooks

f40587c

for more information, see https://pre-commit.ci


          fix unpack

d9ce13f

Signed-off-by: Kaihui-intel <[email protected]>


          Merge branch 'kaihui/pack_2x' of https://github.com/intel/neural-comp…

8b8af8f

…ressor into kaihui/pack_2x

xin3he approved these changes

View reviewed changes

chensuyue approved these changes

View reviewed changes

chensuyue merged commit daa1431 into master

chensuyue deleted the kaihui/pack_2x branch

June 5, 2024 05:44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet