GPTQ performance in llmcompressor

**Describe the bug**
The result of running gptq on a model with llmcompressor doesnt provide the expected results, atleast not like auto_gptq. i think there might be a bug in the implementation.

**Expected behavior**
A much higher accuracy on several metrics after running gptq 

**Environment**
Include all relevant environment information:
1. Ubuntu 22.04.3 LTS
2. Python 3.10.12
3. LLM Compressor 0.4.0:
4. torch 2.4.0, transformers 4.48.2
5. Cuda 12.2

**To Reproduce**
Use model "meta-llama/Llama-3.1-8B-Instruct"
Run quantization only with the 2of4_w4a16_group-128_recipe.yaml example to quantize to 4 bit
Run quantization with auto_gptq to 4 bit witht the same parameters.
evaluate with lm_eval

llmcomrpessor results:
arc-c: 0.35
winogrande: 0.6
wikitext: 9.2

auto_gptq results:
arc-c: 0.49
winogrande: 0.72
wikitext: 9.12

base_results:
arc-c: 0.51
winogrande: 0.73
wikitext: 8.6420


As we can see there is a big difference while both running GPTQ method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPTQ performance in llmcompressor #1123

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPTQ performance in llmcompressor #1123

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions