-
Notifications
You must be signed in to change notification settings - Fork 194
[Examples] Standardize AWQ example #1412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
7bb92f5
standardize awq example
kylesayrs c3414c1
fix typo
kylesayrs 7fc2686
remove unnecessary truncate
kylesayrs 79ed200
rename, change num samples help text, add readme
kylesayrs 72c2dec
change num samples, fix link, wrap in main check
kylesayrs b3541d2
Merge remote-tracking branch 'origin' into kylesayrs/fix-awq-example-…
kylesayrs 7a302a6
do not use main wrapper
kylesayrs 825e452
Merge branch 'main' into kylesayrs/fix-awq-example-typo
kylesayrs File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Quantizing Models with Activation-Aware Quantization (AWQ) # | ||
|
||
Activation Aware Quantization (AWQ) is a state-of-the-art technique to quantize the weights of large language models which involves using a small calibration dataset to calibrate the model. The AWQ algorithm utilizes calibration data to derive scaling factors which reduce the dynamic range of weights while minimizing accuracy loss to the most salient weight values. | ||
|
||
The AWQ implementation found in LLM Compressor is derived from the pioneering work of [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) and with assistance from its original maintainer, [@casper-hansen](https://github.com/casper-hansen). | ||
|
||
## AWQ Recipe ## | ||
|
||
The AWQ recipe has been inferfaced as follows, where the `AWQModifier` adjusts model scales ahead of efficient weight quantization by the `QuantizationModifier` | ||
|
||
```python | ||
recipe = [ | ||
AWQModifier(bits=4, symmetric=False), | ||
QuantizationModifier( | ||
ignore=["lm_head"], | ||
config_groups={ | ||
"group_0": QuantizationScheme( | ||
targets=["Linear"], | ||
weights=QuantizationArgs( | ||
num_bits=4, | ||
type=QuantizationType.INT, | ||
dynamic=False, | ||
symmetric=False, | ||
strategy=QuantizationStrategy.GROUP, | ||
group_size=128, | ||
rahul-tuli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
), | ||
) | ||
}, | ||
), | ||
] | ||
``` | ||
|
||
## Compressing Your Own Model ## | ||
To use your own model, start with an existing example change the `model_id` to match your own model stub. | ||
```python | ||
model_id = "path/to/your/model" | ||
model = AutoModelForCausalLM.from_pretrained( | ||
model_id, | ||
device_map="auto", | ||
torch_dtype="auto", | ||
) | ||
``` | ||
|
||
## Adding Mappings ## | ||
In order to target weight and activation scaling locations within the model, the `AWQModifier` must be provided an AWQ mapping. For example, the AWQ mapping for the Llama family of models looks like this: | ||
|
||
```python | ||
[ | ||
AWQMapping( | ||
"re:.*input_layernorm", | ||
["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"], | ||
), | ||
AWQMapping("re:.*v_proj", ["re:.*o_proj"]), | ||
AWQMapping( | ||
"re:.*post_attention_layernorm", | ||
["re:.*gate_proj", "re:.*up_proj"], | ||
), | ||
AWQMapping( | ||
"re:.*up_proj", | ||
["re:.*down_proj"], | ||
), | ||
] | ||
``` | ||
|
||
To support other model families, you can add supply your own mappings via the `mappings` argument with instantiating the `AWQModifier`, or you can add them to the registry [here](/src/llmcompressor/modifiers/awq/mappings.py) (contributions are welcome!) | ||
rahul-tuli marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.