-
Notifications
You must be signed in to change notification settings - Fork 195
Initial README #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
README.md
Outdated
|
||
oneshot( | ||
model="TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T", # sample model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would update this to use TinyLlama/TinyLlama-1.1B-Chat-v1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should instead pass a dataset that the user created here, like this example
The "open_platypus" is a bit too opaque
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed - wanted to keep the code minimal in the readme, taking a look at what we can do
# sets parameters for the GPTQ algorithms - target Linear layer weights at 4 bits | ||
gptq = GPTQModifier(scheme="W4A16", targets="Linear") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we initialize the model outside of the one-shot
via AutoModelForCausalLM
or does this require SparseAutoModelForCausalLM
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the HFQuantizer is upstreamed the base auto model will work
``` | ||
|
||
### Inference with vLLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add example code for inference with transformers? Just to make it clear that both are supported with the caveat that transformers runs in fake quant mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea, let's add this after we can get the HFQuantizer integration landed since that will affect that pathway
Define BaseModels for Quantization
No description provided.