Initial README #3

bfineran · 2024-06-24T19:23:42Z

No description provided.

README.md

robertgshaw2-redhat · 2024-06-24T20:24:59Z

README.md

+
+oneshot(
+    model="TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T",  # sample model


I would update this to use TinyLlama/TinyLlama-1.1B-Chat-v1.0

I think we should instead pass a dataset that the user created here, like this example

https://github.com/robertgshaw2-neuralmagic/vllm-examples/blob/main/quantization/w8a8-example.py

The "open_platypus" is a bit too opaque

agreed - wanted to keep the code minimal in the readme, taking a look at what we can do

README.md

robertgshaw2-redhat · 2024-06-24T20:34:38Z

README.md

+# sets parameters for the GPTQ algorithms - target Linear layer weights at 4 bits
+gptq = GPTQModifier(scheme="W4A16", targets="Linear")
+


Can we initialize the model outside of the one-shot via AutoModelForCausalLM or does this require SparseAutoModelForCausalLM?

after the HFQuantizer is upstreamed the base auto model will work

Satrat · 2024-06-24T21:08:11Z

README.md

+```
+
+### Inference with vLLM


Could we also add example code for inference with transformers? Just to make it clear that both are supported with the caveat that transformers runs in fake quant mode

great idea, let's add this after we can get the HFQuantizer integration landed since that will affect that pathway

Define BaseModels for Quantization

initial readme

a56b163

bfineran requested review from Satrat, markurtz and robertgshaw2-redhat June 24, 2024 19:23

bfineran self-assigned this Jun 24, 2024

compression blurb

d52c555

robertgshaw2-redhat reviewed Jun 24, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

robertgshaw2-redhat reviewed Jun 24, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

robertgshaw2-redhat reviewed Jun 24, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

robertgshaw2-redhat reviewed Jun 24, 2024

View reviewed changes

Satrat reviewed Jun 24, 2024

View reviewed changes

Benjamin and others added 4 commits June 25, 2024 15:55

review response

58232ea

updating README.md

7485797

Merge branch 'main' into init-readme

c0b65fb

update README

f2f5bae

robertgshaw2-redhat approved these changes Jul 7, 2024

View reviewed changes

robertgshaw2-redhat marked this pull request as ready for review July 7, 2024 22:20

cleanup ore

ae26182

robertgshaw2-redhat merged commit aa6558b into main Jul 7, 2024
8 of 12 checks passed

markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024

Merge pull request vllm-project#3 from neuralmagic/sa/quant_config

318ad21

Define BaseModels for Quantization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial README #3

Initial README #3

Uh oh!

bfineran commented Jun 24, 2024

Uh oh!

Uh oh!

robertgshaw2-redhat Jun 24, 2024

Uh oh!

robertgshaw2-redhat Jun 24, 2024

Uh oh!

bfineran Jun 25, 2024

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat Jun 24, 2024

Uh oh!

bfineran Jun 25, 2024

Uh oh!

Satrat Jun 24, 2024

Uh oh!

bfineran Jun 25, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!


		oneshot(
		model="TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T", # sample model

		# sets parameters for the GPTQ algorithms - target Linear layer weights at 4 bits
		gptq = GPTQModifier(scheme="W4A16", targets="Linear")

Initial README #3

Initial README #3

Uh oh!

Conversation

bfineran commented Jun 24, 2024

Uh oh!

Uh oh!

robertgshaw2-redhat Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

bfineran Jun 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

bfineran Jun 25, 2024

Choose a reason for hiding this comment

Uh oh!

Satrat Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

bfineran Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bfineran Jun 25, 2024 •

edited

Loading