Skip to content

GPTQ performance in llmcompressor #1123

@nirey10

Description

@nirey10

Describe the bug
The result of running gptq on a model with llmcompressor doesnt provide the expected results, atleast not like auto_gptq. i think there might be a bug in the implementation.

Expected behavior
A much higher accuracy on several metrics after running gptq

Environment
Include all relevant environment information:

  1. Ubuntu 22.04.3 LTS
  2. Python 3.10.12
  3. LLM Compressor 0.4.0:
  4. torch 2.4.0, transformers 4.48.2
  5. Cuda 12.2

To Reproduce
Use model "meta-llama/Llama-3.1-8B-Instruct"
Run quantization only with the 2of4_w4a16_group-128_recipe.yaml example to quantize to 4 bit
Run quantization with auto_gptq to 4 bit witht the same parameters.
evaluate with lm_eval

llmcomrpessor results:
arc-c: 0.35
winogrande: 0.6
wikitext: 9.2

auto_gptq results:
arc-c: 0.49
winogrande: 0.72
wikitext: 9.12

base_results:
arc-c: 0.51
winogrande: 0.73
wikitext: 8.6420

As we can see there is a big difference while both running GPTQ method.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions