-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Quantized Gorilla #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantized Gorilla #160
Conversation
added self-contained colab for tutorial for llama.cpp local inference with Gorilla
Co-authored-by: Pranav Ramesh <[email protected]>
Thanks for the PR @CharlieJCJ and @pranramesh! Did you get a chance to test all the three models? I remember for Falcon and MPT they needed some minor tweaks. So, good to test they work - functionally, and the outputs make logical sense. Also, while you are at it, do you mind quantizing the Rest of it looks good, I'll go ahead and merge. |
Yep, I'll take care of quantizing |
K-quantized
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
One potential follow-up is to benchmark the performance of these models with respect to the full precision ones.
Resolved #77 , demo displaying local inference with textwebui.
K-quantized gorilla models can be found on Huggingface: Llama-based, MPT-Based, Falcon-Based,
gorilla-openfunctions-v0-gguf
,gorilla-openfunctions-v1-gguf
A tutorial walkthrough on how to quantize model using llama.cpp with different quantization methods documented in
.
Running local inference with Gorilla on a clean interface is simple. Demoed using text-generation-webui, add your desired models, and run inference.
More details in
/inference
READMECo-authored-by: Pranav Ramesh [email protected]