-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Closed
Labels
DiscussionQuestions or discussionsQuestions or discussions
Description
In the unsloth Gemma intro blogpost, you mention VRAM increase due to larger MLP
size in Gemma
compared to Llama
and Mistral
, and show a graph demonstrating decreased memory usage when running unsloth
vs. HF
and FA2
:
- How does unsloth reduce memory usage?
- What are the model and runtime configs used to generate the
HF
vsFA2
vsunsloth
graph? Is it inference or training?
Curious what optimizations are leading to memory decrease -- quantization, autograd efficiency, etc.
Metadata
Metadata
Assignees
Labels
DiscussionQuestions or discussionsQuestions or discussions