Skip to content

GPTQ Algorithm Cleanup #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 28, 2024
Merged

GPTQ Algorithm Cleanup #120

merged 13 commits into from
Aug 28, 2024

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Aug 27, 2024

Purpose

  1. Clean up implementation for easier reading (comments, better structure)
  2. Allow the algorithm to be skipped if the layer is not being targeted
  3. Fix bug where layer is not frozen after QuantizationModifier
  4. Prevent weight observer misuse
  5. Depreciate weight_fake_quant use case

Changes

  • ensure that freeze_quantization is True (default), even if QuantizationModifier is wrapped by GPTQModifier
  • implement get_attr_chain helper function to be used for getting chained attributes
  • use get_attr_chain to get weight quantization arguments and skip computation if weight does not have valid args
  • directly use memoryless observer to avoid misuse with unsupported observers
  • perform transpose and float conversion in place to reduce memory use
  • break out logging operations to separate function
  • remove weight_fake_quant cases

Testing

Regression tested saving, loading, and vllm inferencing with group quantized model

@kylesayrs kylesayrs requested review from Satrat and rahul-tuli August 27, 2024 20:17
Copy link
Contributor

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for cleaning this up! Had a few minor notes, and could we also add a test to confirm skipping layers works as intended?

@kylesayrs kylesayrs requested a review from Satrat August 28, 2024 02:59
@kylesayrs
Copy link
Collaborator Author

@Satrat Can you specify what you're looking for in a skip test?

@Satrat
Copy link
Contributor

Satrat commented Aug 28, 2024

@Satrat Can you specify what you're looking for in a skip test?

You could just initialize a module with some modules skipped (more than the lm_head) and others quantized, then search the logs for the debug string, or just testing your getattr_chain helper function directly on the model would be fine too

Copy link
Contributor

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests LGTM! But theres a failing base test and a failing style test (fix with make style then make quality)

@kylesayrs
Copy link
Collaborator Author

Yeah the failing base test is because of a bug from the previous release which I fixed in the main branch
See: https://github.com/neuralmagic/compressed-tensors/blame/4b214e582c8434733efea79239cfadec9358b7fb/src/compressed_tensors/quantization/observers/base.py#L165-L167

@kylesayrs
Copy link
Collaborator Author

Using my local machine and the main branch of compressed_tensors, I confirmed that the tests/llmcompressor/modifiers/ and tests/llmcompressor/transformers/compression/ are passing

@kylesayrs kylesayrs merged commit e64c74d into main Aug 28, 2024
3 of 7 checks passed
@kylesayrs kylesayrs deleted the gptq-cleanup branch August 28, 2024 20:19
markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants