Skip to content

Quantized Gorilla #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 4, 2024
Merged

Quantized Gorilla #160

merged 10 commits into from
Feb 4, 2024

Conversation

CharlieJCJ
Copy link
Collaborator

@CharlieJCJ CharlieJCJ commented Jan 30, 2024

Resolved #77 , demo displaying local inference with textwebui.

K-quantized gorilla models can be found on Huggingface: Llama-based, MPT-Based, Falcon-Based, gorilla-openfunctions-v0-gguf, gorilla-openfunctions-v1-gguf

A tutorial walkthrough on how to quantize model using llama.cpp with different quantization methods documented in Colab.

Running local inference with Gorilla on a clean interface is simple. Demoed using text-generation-webui, add your desired models, and run inference.

More details in /inference README

Co-authored-by: Pranav Ramesh [email protected]

@ShishirPatil
Copy link
Owner

Thanks for the PR @CharlieJCJ and @pranramesh!

Did you get a chance to test all the three models? I remember for Falcon and MPT they needed some minor tweaks. So, good to test they work - functionally, and the outputs make logical sense.

Also, while you are at it, do you mind quantizing the openfunctions-v0 and openfunctions-v1as well? You don't have to have the inference scripts ready now.

Rest of it looks good, I'll go ahead and merge.

@CharlieJCJ
Copy link
Collaborator Author

CharlieJCJ commented Jan 31, 2024

Yep, I'll take care of quantizing openfunctions-v0 and openfunctions-v1, and add their links to README as well in a day. I've tested MPT and Falcon ones, and their outputs are consistent with the original models.

@CharlieJCJ
Copy link
Collaborator Author

CharlieJCJ commented Jan 31, 2024

openfunctions-v0 and openfunctions-v1 quantized versions generated and uploaded.

K-quantized gorilla-openfunctions-v0 and gorilla-openfunctions-v1 models can be found on Huggingface: gorilla-openfunctions-v0-gguf, gorilla-openfunctions-v1-gguf

/inference README updated.

Copy link
Owner

@ShishirPatil ShishirPatil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

One potential follow-up is to benchmark the performance of these models with respect to the full precision ones.

@ShishirPatil ShishirPatil merged commit 95edaa0 into ShishirPatil:main Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Run gorilla locally without GPUs 🦍
3 participants