[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example #1565

shanjiaz · 2025-06-18T03:50:37Z

SUMMARY:
Pass saved model directly to next stage since now train and oneshot can properly initialize models from path.

TEST PLAN:
Testing test_quantization_2of4_sparse_w4a16 locally

collected 2 items                                                                                                                                    

tests/examples/test_quantization_2of4_sparse_w4a16.py::TestQuantization24SparseW4A16::test_doc_example_commPASSED
tests/examples/test_quantization_2of4_sparse_w4a16.py::TestQuantization24SparseW4A16::test_alternative_recipe PASSED

=========================================================== 2 passed in 6123.28s (1:42:03) ===========================================================

Signed-off-by: shanjiaz <[email protected]>

github-actions · 2025-06-18T03:50:45Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist

Summary of Changes

Hello @shanjiaz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in the quantization_2of4_sparse_w4a16 example script. The core issue was that models were not being correctly decompressed when passed from one optimization stage to the next, leading to potential errors or incorrect behavior. The changes introduce explicit model loading and decompression steps between the sparsity, finetuning, and quantization phases, ensuring a correct workflow for the example.

Highlights

Corrected Model Flow: Implemented explicit loading of models from the output_dir after the sparsity and finetuning stages, ensuring that the correct model state is retrieved for subsequent operations.
Inter-Stage Decompression: Introduced a critical decompression step for models loaded after the sparsity and finetuning stages, utilizing get_model_compressor and decompress_model to properly prepare the models for the next optimization phase.
Output Directory Handling: Ensured the output_dir parameter is correctly passed to the oneshot function's keyword arguments, facilitating proper saving and loading of intermediate models.
Streamlined Saving: Removed redundant save_pretrained calls for the final quantized model and tokenizer, as the oneshot function now implicitly handles saving to the specified output_dir.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request adds a decompression step between processing stages in the llama7b_sparse_w4a16.py example. The model is reloaded from disk after the sparsity and finetuning stages, and then decompressed. The explicit final save calls were removed, which is correct. Refactoring the repeated model reloading and decompression logic into a helper function would improve code maintainability.

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py

Signed-off-by: shanjiaz <[email protected]>

tests/examples/test_quantization_2of4_sparse_w4a16.py

dsikka

You need to rebase your PR

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py

Signed-off-by: shanjiaz <[email protected]>

…/vllm-project/llm-compressor into hz-fix-example-quantization-2of4

Signed-off-by: shanjiaz <[email protected]>

brian-dellabetta

it looks like this keeps reference to model, oneshot_applied_model, and finetune_applied_model. so essentially 3 models are living in memory. same behavior as original example, but just wanted to call it out

shanjiaz · 2025-06-18T17:50:54Z

it looks like this keeps reference to model, oneshot_applied_model, and finetune_applied_model. so essentially 3 models are living in memory. same behavior as original example, but just wanted to call it out

Good point! Not sure if there's anything I can do with saving three different models. I'm testing to see if I can skip the manual decompression step at least

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py

Signed-off-by: shanjiaz <[email protected]>

…/vllm-project/llm-compressor into hz-fix-example-quantization-2of4

Signed-off-by: shanjiaz <[email protected]>

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py

kylesayrs · 2025-06-18T20:58:24Z

Looks good, just make sure you're using pathlib in a way you expect

Signed-off-by: shanjiaz <[email protected]>

…/vllm-project/llm-compressor into hz-fix-example-quantization-2of4

brian-dellabetta

nice

jessiewiswjc · 2025-07-03T06:22:29Z

@shanjiaz Good job! Have you tested the accuracy of the model produced by this script? I am sad to find that my 14B model's mmlu score dropped a lot (from 72 to 60) after using this script.

SUMMARY: Pass saved model directly to next stage since now `train` and `oneshot` can properly initialize models from path. TEST PLAN: Testing `test_quantization_2of4_sparse_w4a16` locally ``` collected 2 items tests/examples/test_quantization_2of4_sparse_w4a16.py::TestQuantization24SparseW4A16::test_doc_example_commPASSED tests/examples/test_quantization_2of4_sparse_w4a16.py::TestQuantization24SparseW4A16::test_alternative_recipe PASSED =========================================================== 2 passed in 6123.28s (1:42:03) =========================================================== ``` --------- Signed-off-by: shanjiaz <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

dsikka and others added 5 commits June 5, 2025 14:34

Update setup.py

039c0fe

Update setup.py

9d8f510

Merge branch 'main' of https://github.com/vllm-project/llm-compressor

66658d9

Merge branch 'main' of https://github.com/vllm-project/llm-compressor

f9d17bd

fix quantization_2of4 example

9a83a88

Signed-off-by: shanjiaz <[email protected]>

gemini-code-assist bot reviewed Jun 18, 2025

View reviewed changes

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py Outdated Show resolved Hide resolved

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py Outdated Show resolved Hide resolved

shanjiaz and others added 2 commits June 17, 2025 23:53

Merge branch 'main' into hz-fix-example-quantization-2of4

869b7c0

fix style

a791460

Signed-off-by: shanjiaz <[email protected]>

shanjiaz added the ready When a PR is ready for review label Jun 18, 2025

shanjiaz marked this pull request as ready for review June 18, 2025 06:05

shanjiaz removed ready When a PR is ready for review labels Jun 18, 2025

shanjiaz changed the title ~~[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example~~ [BugFix][WIP] Fix quantizaiton_2of4_sparse_w4a16 example Jun 18, 2025

shanjiaz and others added 2 commits June 18, 2025 07:04

update quantization stage format

29ff051

Signed-off-by: shanjiaz <[email protected]>

Merge branch 'main' into hz-fix-example-quantization-2of4

f9e97d3

dsikka requested changes Jun 18, 2025

View reviewed changes

tests/examples/test_quantization_2of4_sparse_w4a16.py Outdated Show resolved Hide resolved

dsikka requested changes Jun 18, 2025

View reviewed changes

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py Outdated Show resolved Hide resolved

dsikka reviewed Jun 18, 2025

View reviewed changes

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py Outdated Show resolved Hide resolved

shanjiaz and others added 7 commits June 18, 2025 10:13

Merge branch 'main' into hz-fix-example-quantization-2of4

3af2bbd

Merge branch 'main' of https://github.com/vllm-project/llm-compressor

6c081a6

fix quantization_2of4 example

f8f4ea3

Signed-off-by: shanjiaz <[email protected]>

fix style

61a5780

Signed-off-by: shanjiaz <[email protected]>

update quantization stage format

7d719d6

Signed-off-by: shanjiaz <[email protected]>

Merge branch 'hz-fix-example-quantization-2of4' of https://github.com…

eb2e4d9

…/vllm-project/llm-compressor into hz-fix-example-quantization-2of4

fix sparsity config

f36544e

Signed-off-by: shanjiaz <[email protected]>

shanjiaz added the ready When a PR is ready for review label Jun 18, 2025

shanjiaz changed the title ~~[BugFix][WIP] Fix quantizaiton_2of4_sparse_w4a16 example~~ [BugFix] Fix quantizaiton_2of4_sparse_w4a16 example Jun 18, 2025

Merge branch 'main' into hz-fix-example-quantization-2of4

f1744d7

brian-dellabetta previously approved these changes Jun 18, 2025

View reviewed changes

dsikka requested changes Jun 18, 2025

View reviewed changes

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py Outdated Show resolved Hide resolved

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py Show resolved Hide resolved

shanjiaz added 2 commits June 18, 2025 15:03

remove redundant loading/decompressing step

ad3f2db

Signed-off-by: shanjiaz <[email protected]>

Merge branch 'hz-fix-example-quantization-2of4' of https://github.com…

dfaacea

…/vllm-project/llm-compressor into hz-fix-example-quantization-2of4

shanjiaz dismissed brian-dellabetta’s stale review via dfaacea June 18, 2025 19:03

shanjiaz and others added 4 commits June 18, 2025 15:12

fix style

a7c22ae

Signed-off-by: shanjiaz <[email protected]>

simplify loading

d79d9ae

Signed-off-by: shanjiaz <[email protected]>

Fixed style

37ef911

Merge branch 'main' into hz-fix-example-quantization-2of4

b04242d

shanjiaz requested review from dsikka and brian-dellabetta June 18, 2025 20:02

dsikka previously approved these changes Jun 18, 2025

View reviewed changes

kylesayrs reviewed Jun 18, 2025

View reviewed changes

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py Outdated Show resolved Hide resolved

kylesayrs previously approved these changes Jun 18, 2025

View reviewed changes

kylesayrs enabled auto-merge (squash) June 18, 2025 20:59

shanjiaz added 2 commits June 18, 2025 17:10

fix pathlib usage

112f0a9

Signed-off-by: shanjiaz <[email protected]>

Merge branch 'hz-fix-example-quantization-2of4' of https://github.com…

479cd42

…/vllm-project/llm-compressor into hz-fix-example-quantization-2of4

shanjiaz dismissed stale reviews from kylesayrs and dsikka via 479cd42 June 18, 2025 21:10

shanjiaz requested a review from kylesayrs June 18, 2025 21:17

brian-dellabetta approved these changes Jun 18, 2025

View reviewed changes

kylesayrs approved these changes Jun 18, 2025

View reviewed changes

kylesayrs merged commit 6800f81 into main Jun 18, 2025
11 checks passed

kylesayrs deleted the hz-fix-example-quantization-2of4 branch June 18, 2025 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example #1565

[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example #1565

Uh oh!

shanjiaz commented Jun 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Uh oh!

shanjiaz commented Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Jun 18, 2025

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

jessiewiswjc commented Jul 3, 2025

Uh oh!

Uh oh!

[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example #1565

[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example #1565

Uh oh!

Conversation

shanjiaz commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

shanjiaz commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Jun 18, 2025

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jessiewiswjc commented Jul 3, 2025

Uh oh!

Uh oh!

shanjiaz commented Jun 18, 2025 •

edited

Loading

shanjiaz commented Jun 18, 2025 •

edited

Loading