Faster Inference & Training Roadmap

@danielhanchen 

In the unsloth Gemma intro [blogpost](https://unsloth.ai/blog/gemma), you mention VRAM increase due to larger `MLP` size in `Gemma` compared to `Llama` and `Mistral`, and show a [graph](https://unsloth.ai/cgi/image/VRAM_usage_(extrapolated)_YwIoc66yMei-LpOGYqftM.svg?width=1920&quality=80&format=auto) demonstrating decreased memory usage when running `unsloth` vs. `HF` and `FA2`:

- How does unsloth reduce memory usage?  
- What are the model and runtime configs used to generate the `HF` vs `FA2` vs `unsloth` graph?  Is it inference or training?

Curious what optimizations are leading to memory decrease -- quantization, autograd efficiency, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Faster Inference & Training Roadmap #226

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Faster Inference & Training Roadmap #226

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions