-
Notifications
You must be signed in to change notification settings - Fork 189
GPTQ Algorithm Cleanup #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for cleaning this up! Had a few minor notes, and could we also add a test to confirm skipping layers works as intended?
src/llmcompressor/modifiers/quantization/gptq/utils/gptq_wrapper.py
Outdated
Show resolved
Hide resolved
src/llmcompressor/modifiers/quantization/gptq/utils/gptq_wrapper.py
Outdated
Show resolved
Hide resolved
src/llmcompressor/modifiers/quantization/gptq/utils/gptq_wrapper.py
Outdated
Show resolved
Hide resolved
src/llmcompressor/modifiers/quantization/gptq/utils/gptq_wrapper.py
Outdated
Show resolved
Hide resolved
src/llmcompressor/modifiers/quantization/gptq/utils/gptq_wrapper.py
Outdated
Show resolved
Hide resolved
@Satrat Can you specify what you're looking for in a skip test? |
You could just initialize a module with some modules skipped (more than the lm_head) and others quantized, then search the logs for the debug string, or just testing your getattr_chain helper function directly on the model would be fine too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests LGTM! But theres a failing base test and a failing style test (fix with make style
then make quality
)
Yeah the failing base test is because of a bug from the previous release which I fixed in the main branch |
Using my local machine and the main branch of compressed_tensors, I confirmed that the |
Purpose
Changes
freeze_quantization
is True (default), even if QuantizationModifier is wrapped by GPTQModifierget_attr_chain
helper function to be used for getting chained attributesget_attr_chain
to get weight quantization arguments and skip computation if weight does not have valid argsTesting
Regression tested saving, loading, and vllm inferencing with group quantized model