GPTQ Algorithm Cleanup #120

kylesayrs · 2024-08-27T19:29:47Z

Purpose

Clean up implementation for easier reading (comments, better structure)
Allow the algorithm to be skipped if the layer is not being targeted
Fix bug where layer is not frozen after QuantizationModifier
Prevent weight observer misuse
Depreciate weight_fake_quant use case

Changes

ensure that freeze_quantization is True (default), even if QuantizationModifier is wrapped by GPTQModifier
implement get_attr_chain helper function to be used for getting chained attributes
use get_attr_chain to get weight quantization arguments and skip computation if weight does not have valid args
directly use memoryless observer to avoid misuse with unsupported observers
perform transpose and float conversion in place to reduce memory use
break out logging operations to separate function
remove weight_fake_quant cases

Testing

Regression tested saving, loading, and vllm inferencing with group quantized model

Satrat

Looks good, thanks for cleaning this up! Had a few minor notes, and could we also add a test to confirm skipping layers works as intended?

src/llmcompressor/modifiers/quantization/gptq/base.py

src/llmcompressor/modifiers/quantization/gptq/utils/gptq_wrapper.py

src/llmcompressor/utils/helpers.py

kylesayrs · 2024-08-28T03:04:02Z

@Satrat Can you specify what you're looking for in a skip test?

Satrat · 2024-08-28T13:13:25Z

@Satrat Can you specify what you're looking for in a skip test?

You could just initialize a module with some modules skipped (more than the lm_head) and others quantized, then search the logs for the debug string, or just testing your getattr_chain helper function directly on the model would be fine too

Satrat

tests LGTM! But theres a failing base test and a failing style test (fix with make style then make quality)

kylesayrs · 2024-08-28T19:45:12Z

Yeah the failing base test is because of a bug from the previous release which I fixed in the main branch
See: https://github.com/neuralmagic/compressed-tensors/blame/4b214e582c8434733efea79239cfadec9358b7fb/src/compressed_tensors/quantization/observers/base.py#L165-L167

kylesayrs · 2024-08-28T19:48:16Z

Using my local machine and the main branch of compressed_tensors, I confirmed that the tests/llmcompressor/modifiers/ and tests/llmcompressor/transformers/compression/ are passing

kylesayrs added 6 commits August 27, 2024 19:13

do not freeze if initialized from gptq

1fe188b

add get_attr_chain helper function

b06a103

cleanup and clarify logic

f293efd

apply style

a99e0da

rename to getattr_chain, handle no default case

bf915d4

out of place type conversion

66ef96b

kylesayrs requested review from Satrat and rahul-tuli August 27, 2024 20:17

Satrat suggested changes Aug 27, 2024

View reviewed changes

kylesayrs added 3 commits August 28, 2024 01:45

remove freeze_quantization argument

b711e14

remove fake_quantization case, update debug message

974dbc7

remove todo

094e429

kylesayrs requested a review from Satrat August 28, 2024 02:59

kylesayrs added 3 commits August 28, 2024 16:53

change debug log, add test

7736b94

add test_getattr_chain

85b32f5

add test_getattr_chain

9ec11cc

Satrat approved these changes Aug 28, 2024

View reviewed changes

Satrat suggested changes Aug 28, 2024

View reviewed changes

rename variable

35506c8

Satrat approved these changes Aug 28, 2024

View reviewed changes

kylesayrs merged commit e64c74d into main Aug 28, 2024
3 of 7 checks passed

kylesayrs deleted the gptq-cleanup branch August 28, 2024 20:19

kylesayrs mentioned this pull request Sep 12, 2024

Patch log function name in gptq #168

Merged

markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024

Fix Execution Device Helper Fn (vllm-project#120)

4a33d12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPTQ Algorithm Cleanup #120

GPTQ Algorithm Cleanup #120

Uh oh!

kylesayrs commented Aug 27, 2024 •

edited

Loading

Uh oh!

Satrat left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Aug 28, 2024

Uh oh!

Satrat commented Aug 28, 2024

Uh oh!

Satrat left a comment

Uh oh!

kylesayrs commented Aug 28, 2024

Uh oh!

kylesayrs commented Aug 28, 2024

Uh oh!

Uh oh!

Uh oh!

GPTQ Algorithm Cleanup #120

GPTQ Algorithm Cleanup #120

Uh oh!

Conversation

kylesayrs commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Testing

Uh oh!

Satrat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Aug 28, 2024

Uh oh!

Satrat commented Aug 28, 2024

Uh oh!

Satrat left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Aug 28, 2024

Uh oh!

kylesayrs commented Aug 28, 2024

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Aug 27, 2024 •

edited

Loading