Incorrect/Garbage Responses for Llama-2-7b-hf with INT4 GPTQ/RTN Asymmetric Quantization

### Describe the issue

I am trying to quantize and run `Llama-2-7b-hf` model using the example [here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/language_model/llama/weight_only_quant).

I was able to successfully generate the `int4` model with GPTQ quantization by running below command.
Settings:
```
Namespace(model_input='.\\llama2-7b-fp32\\', model_output='.\\Llama-2-7b-hf-gptq-asym', benchmark=False, quantize=True, batch_size=1, workspace='nc_workspace', algorithm='GPTQ', pad_max=196, seqlen=2048, tasks=['winogrande', 'copa', 'piqa', 'rte', 'hellaswag', 'openbookqa', 'lambada_openai', 'lambada_standard', 'wikitext'], dataset='NeelNanda/pile-10k', block_size=32, is_symmetric=False, accuracy_level=0, sampling_size=8)
```
However, when I try to run on CPU, I get garbage results for any prompt. 
```
- Prompt: ONNX Runtime is
- Response: ONNX Runtime is  prisoner categorieпута Clientública одногоúblicaública одногоúblicaúblicaúblicapplyúblicaúblicaúblicaúblicaúblicaúblicaúblicażeública geometricúblicażeúblicaúblicaúblicaúblicaúblicaúblicaúblicaúblicaúblicaுúblicaúblicaúblicaże zou[ întRunública Stim cruelF

- Prompt: I want to book a vacation to Hawaii. First, I need to
- Response: I want to book a vacation to Hawaii. First, I need to Statusifier liesStatusifierDOCTYPEissenschaft schedulecmpyed optyed optultan")yed opt diferenелісляcompos into")ultan intoultan optultan \( into oderifierultan rappresentultanел diferenyedyedམła intoyed into")cloudflareел

- Prompt: A good workout routine is
- Response: A good workout routine is 今设 gewesen gewesenісляwardwardwardward musical pueblo gewesen gewesen gewesen gewesenove gewesenoveісля instant zouwardxisісляwardісля instantoveRemoteісля gewesen только estaven толькоxis instantіслярия Wahl только zou서іслярияottiottiaba

- Prompt: How are astronauts launched into space?
- Response: How are astronauts launched into space? emarkemarkemark기 Wahl------+ел기ел기기yed finsелeringелłyyed finsyedелел기othy기 fatyed기temperaturen기기temperaturen thouісляtemperaturen기othy기yed Agutemperaturenелелел thouелinental
```
Similar output is observed with RTN Asymmetric INT4 model as well.

### To reproduce

Following [onnxruntime-inference-examples](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/language_model/llama/weight_only_quant/README.md) WOQ README.
```powershell
python main.py --model_input .\llama2-7b-fp32\ --model_output .\Llama-2-7b-hf-gptq-asym --accuracy_level 0 --quantize --algorithm GPTQ
```

I have used the inference code from [here](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/python/models/llama2/LLaMA-2%20E2E%20Notebook.ipynb) with some changes mentioned below
```python
use_fp16 = False  # True when KV cache inputs/outputs are in float16
use_buffer_share = False  # True when --use_gqa was passed during export
device = torch.device("cpu")  # running on CPU
```

### Urgency

_No response_

### Platform

Windows

### OS Version

Windows 11

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

v1.17.0

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect/Garbage Responses for Llama-2-7b-hf with INT4 GPTQ/RTN Asymmetric Quantization #19450

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect/Garbage Responses for Llama-2-7b-hf with INT4 GPTQ/RTN Asymmetric Quantization #19450

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions