Skip to content

[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example #1565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jun 18, 2025

Conversation

shanjiaz
Copy link
Collaborator

@shanjiaz shanjiaz commented Jun 18, 2025

SUMMARY:
Pass saved model directly to next stage since now train and oneshot can properly initialize models from path.

TEST PLAN:
Testing test_quantization_2of4_sparse_w4a16 locally

collected 2 items                                                                                                                                    

tests/examples/test_quantization_2of4_sparse_w4a16.py::TestQuantization24SparseW4A16::test_doc_example_commPASSED
tests/examples/test_quantization_2of4_sparse_w4a16.py::TestQuantization24SparseW4A16::test_alternative_recipe PASSED

=========================================================== 2 passed in 6123.28s (1:42:03) ===========================================================

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @shanjiaz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in the quantization_2of4_sparse_w4a16 example script. The core issue was that models were not being correctly decompressed when passed from one optimization stage to the next, leading to potential errors or incorrect behavior. The changes introduce explicit model loading and decompression steps between the sparsity, finetuning, and quantization phases, ensuring a correct workflow for the example.

Highlights

  • Corrected Model Flow: Implemented explicit loading of models from the output_dir after the sparsity and finetuning stages, ensuring that the correct model state is retrieved for subsequent operations.
  • Inter-Stage Decompression: Introduced a critical decompression step for models loaded after the sparsity and finetuning stages, utilizing get_model_compressor and decompress_model to properly prepare the models for the next optimization phase.
  • Output Directory Handling: Ensured the output_dir parameter is correctly passed to the oneshot function's keyword arguments, facilitating proper saving and loading of intermediate models.
  • Streamlined Saving: Removed redundant save_pretrained calls for the final quantized model and tokenizer, as the oneshot function now implicitly handles saving to the specified output_dir.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request adds a decompression step between processing stages in the llama7b_sparse_w4a16.py example. The model is reloaded from disk after the sparsity and finetuning stages, and then decompressed. The explicit final save calls were removed, which is correct. Refactoring the repeated model reloading and decompression logic into a helper function would improve code maintainability.

@shanjiaz shanjiaz added the ready When a PR is ready for review label Jun 18, 2025
@shanjiaz shanjiaz marked this pull request as ready for review June 18, 2025 06:05
@shanjiaz shanjiaz removed ready When a PR is ready for review labels Jun 18, 2025
@shanjiaz shanjiaz changed the title [BugFix] Fix quantizaiton_2of4_sparse_w4a16 example [BugFix][WIP] Fix quantizaiton_2of4_sparse_w4a16 example Jun 18, 2025
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to rebase your PR

@shanjiaz shanjiaz added the ready When a PR is ready for review label Jun 18, 2025
@shanjiaz shanjiaz changed the title [BugFix][WIP] Fix quantizaiton_2of4_sparse_w4a16 example [BugFix] Fix quantizaiton_2of4_sparse_w4a16 example Jun 18, 2025
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this keeps reference to model, oneshot_applied_model, and finetune_applied_model. so essentially 3 models are living in memory. same behavior as original example, but just wanted to call it out

@shanjiaz
Copy link
Collaborator Author

shanjiaz commented Jun 18, 2025

it looks like this keeps reference to model, oneshot_applied_model, and finetune_applied_model. so essentially 3 models are living in memory. same behavior as original example, but just wanted to call it out

Good point! Not sure if there's anything I can do with saving three different models. I'm testing to see if I can skip the manual decompression step at least

dsikka
dsikka previously approved these changes Jun 18, 2025
@kylesayrs
Copy link
Collaborator

Looks good, just make sure you're using pathlib in a way you expect

kylesayrs
kylesayrs previously approved these changes Jun 18, 2025
@kylesayrs kylesayrs enabled auto-merge (squash) June 18, 2025 20:59
@shanjiaz shanjiaz dismissed stale reviews from kylesayrs and dsikka via 479cd42 June 18, 2025 21:10
@shanjiaz shanjiaz requested a review from kylesayrs June 18, 2025 21:17
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@kylesayrs kylesayrs merged commit 6800f81 into main Jun 18, 2025
11 checks passed
@kylesayrs kylesayrs deleted the hz-fix-example-quantization-2of4 branch June 18, 2025 21:48
@jessiewiswjc
Copy link

@shanjiaz Good job! Have you tested the accuracy of the model produced by this script? I am sad to find that my 14B model's mmlu score dropped a lot (from 72 to 60) after using this script.

aireilly pushed a commit to aireilly/llm-compressor that referenced this pull request Jul 30, 2025
SUMMARY:
Pass saved model directly to next stage since now `train` and `oneshot`
can properly initialize models from path.


TEST PLAN:
Testing `test_quantization_2of4_sparse_w4a16` locally

```
collected 2 items                                                                                                                                    

tests/examples/test_quantization_2of4_sparse_w4a16.py::TestQuantization24SparseW4A16::test_doc_example_commPASSED
tests/examples/test_quantization_2of4_sparse_w4a16.py::TestQuantization24SparseW4A16::test_alternative_recipe PASSED

=========================================================== 2 passed in 6123.28s (1:42:03) ===========================================================
```

---------

Signed-off-by: shanjiaz <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants