Skip to content

[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example #1565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jun 18, 2025
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
039c0fe
Update setup.py
dsikka Jun 5, 2025
9d8f510
Update setup.py
dsikka Jun 5, 2025
66658d9
Merge branch 'main' of https://github.com/vllm-project/llm-compressor
shanjiaz Jun 16, 2025
f9d17bd
Merge branch 'main' of https://github.com/vllm-project/llm-compressor
shanjiaz Jun 17, 2025
9a83a88
fix quantization_2of4 example
shanjiaz Jun 18, 2025
869b7c0
Merge branch 'main' into hz-fix-example-quantization-2of4
shanjiaz Jun 18, 2025
a791460
fix style
shanjiaz Jun 18, 2025
29ff051
update quantization stage format
shanjiaz Jun 18, 2025
f9e97d3
Merge branch 'main' into hz-fix-example-quantization-2of4
shanjiaz Jun 18, 2025
3af2bbd
Merge branch 'main' into hz-fix-example-quantization-2of4
shanjiaz Jun 18, 2025
6c081a6
Merge branch 'main' of https://github.com/vllm-project/llm-compressor
shanjiaz Jun 18, 2025
f8f4ea3
fix quantization_2of4 example
shanjiaz Jun 18, 2025
61a5780
fix style
shanjiaz Jun 18, 2025
7d719d6
update quantization stage format
shanjiaz Jun 18, 2025
eb2e4d9
Merge branch 'hz-fix-example-quantization-2of4' of https://github.com…
shanjiaz Jun 18, 2025
f36544e
fix sparsity config
shanjiaz Jun 18, 2025
f1744d7
Merge branch 'main' into hz-fix-example-quantization-2of4
shanjiaz Jun 18, 2025
ad3f2db
remove redundant loading/decompressing step
shanjiaz Jun 18, 2025
dfaacea
Merge branch 'hz-fix-example-quantization-2of4' of https://github.com…
shanjiaz Jun 18, 2025
a7c22ae
fix style
shanjiaz Jun 18, 2025
d79d9ae
simplify loading
shanjiaz Jun 18, 2025
37ef911
Fixed style
shanjiaz Jun 18, 2025
b04242d
Merge branch 'main' into hz-fix-example-quantization-2of4
shanjiaz Jun 18, 2025
112f0a9
fix pathlib usage
shanjiaz Jun 18, 2025
479cd42
Merge branch 'hz-fix-example-quantization-2of4' of https://github.com…
shanjiaz Jun 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from pathlib import Path

import torch
from loguru import logger
from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor import oneshot, train
from llmcompressor.utils import dispatch_for_generation

# load the model in as bfloat16 to save on memory and compute
model_stub = "neuralmagic/Llama-2-7b-ultrachat200k"
Expand All @@ -18,6 +19,7 @@

# save location of quantized model
output_dir = "output_llama7b_2of4_w4a16_channel"
output_path = Path(output_dir)

# set dataset config parameters
splits = {"calibration": "train_gen[:5%]", "train": "train_gen"}
Expand Down Expand Up @@ -63,25 +65,26 @@
# ./output_llama7b_2of4_w4a16_channel/ + (finetuning/sparsity/quantization)_stage

# Oneshot sparsification
oneshot_applied_model = oneshot(

oneshot(
model=model,
**oneshot_kwargs,
output_dir=output_dir,
stage="sparsity_stage",
)

# Sparse finetune
dispatch_for_generation(model)
finetune_applied_model = train(
model=oneshot_applied_model,
train(
model=(output_path / "sparsity_stage"),
**oneshot_kwargs,
**training_kwargs,
output_dir=output_dir,
stage="finetuning_stage",
)

# Oneshot quantization
model.to("cpu")
quantized_model = oneshot(
model=finetune_applied_model,
model=(output_path / "finetuning_stage"),
**oneshot_kwargs,
stage="quantization_stage",
)
Expand Down