|
1 |
| -# Home |
| 1 | +# About LLM Compressor |
2 | 2 |
|
3 |
| -!!! info "New Feature: Axolotl Sparse Finetuning Integration" |
4 |
| - Easily finetune sparse LLMs through our seamless integration with Axolotl. |
5 |
| - [Learn more](https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor). |
| 3 | +**LLM Compressor** is an easy-to-use library for optimizing large language models for deployment with vLLM, enabling up to **5X faster, cheaper inference**. It provides a comprehensive toolkit for: |
6 | 4 |
|
7 |
| -!!! info "New Feature: AutoAWQ Integration" |
8 |
| - Perform low-bit weight-only quantization efficiently using AutoAWQ, now part of LLM Compressor. [Learn more](https://github.com/vllm-project/llm-compressor/pull/1177). |
| 5 | +- Applying a wide variety of compression algorithms, including weight and activation quantization, pruning, and more |
| 6 | +- Seamlessly integrating with Hugging Face Transformers, Models, and Datasets |
| 7 | +- Using a `safetensors`-based file format for compressed model storage that is compatible with `vLLM` |
| 8 | +- Supporting performant compression of large models via `accelerate` |
9 | 9 |
|
10 | 10 | ## <div style="display: flex; align-items: center;"><img alt="LLM Compressor Logo" src="assets/llmcompressor-icon.png" width="40" style="vertical-align: middle; margin-right: 10px" /> LLM Compressor</div>
|
11 | 11 |
|
12 | 12 | <p align="center">
|
13 |
| - <img alt="LLM Compressor Flow" src="assets/llmcompressor-user-flows.png" width="100%" style="max-width: 100%;"s /> |
| 13 | + <img alt="LLM Compressor Flow" src="assets/llmcompressor-user-flows.png" width="100%" style="max-width: 100%;"/> |
14 | 14 | </p>
|
15 | 15 |
|
16 |
| -**LLM Compressor** is an easy-to-use library for optimizing large language models for deployment with vLLM, enabling up to **5X faster, cheaper inference**. It provides a comprehensive toolkit for: |
| 16 | +## Recent Updates |
17 | 17 |
|
18 |
| -- Applying a wide variety of compression algorithms, including weight and activation quantization, pruning, and more |
19 |
| -- Seamlessly integrating with Hugging Face Transformers, Models, and Datasets |
20 |
| -- Using a `safetensors`-based file format for compressed model storage that is compatible with `vLLM` |
21 |
| -- Supporting performant compression of large models via `accelerate` |
| 18 | +!!! info "Llama4 Quantization Support" |
| 19 | + Quantize a Llama4 model to [W4A16](examples/quantization_w4a16.md) or [NVFP4](examples/quantization_w4a16.md). The checkpoint produced can seamlessly run in vLLM. |
| 20 | + |
| 21 | +!!! info "Large Model Support with Sequential Onloading" |
| 22 | + As of llm-compressor>=0.6.0, you can now quantize very large language models on a single GPU. Models are broken into disjoint layers which are then onloaded to the GPU one layer at a time. For more information on sequential onloading, see [Big Modeling with Sequential Onloading](examples/big_models_with_sequential_onloading.md) as well as the [DeepSeek-R1 Example](examples/quantizing_moe.md). |
| 23 | + |
| 24 | +!!! info "Preliminary FP4 Quantization Support" |
| 25 | + Quantize weights and activations to FP4 and seamlessly run the compressed model in vLLM. Model weights and activations are quantized following the NVFP4 [configuration](https://github.com/neuralmagic/compressed-tensors/blob/f5dbfc336b9c9c361b9fe7ae085d5cb0673e56eb/src/compressed_tensors/quantization/quant_scheme.py#L104). See examples of [weight-only quantization](examples/quantization_w4a16_fp4.md) and [fp4 activation support](examples/quantization_w4a4_fp4.md). Support is currently preliminary and additional support will be added for MoEs. |
| 26 | + |
| 27 | +!!! info "Updated AWQ Support" |
| 28 | + Improved support for MoEs with better handling of larger models |
| 29 | + |
| 30 | +!!! info "Axolotl Sparse Finetuning Integration" |
| 31 | + Seamlessly finetune sparse LLMs with our Axolotl integration. Learn how to create [fast sparse open-source models with Axolotl and LLM Compressor](https://developers.redhat.com/articles/2025/06/17/axolotl-meets-llm-compressor-fast-sparse-open). See also the [Axolotl integration docs](https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor). |
| 32 | + |
| 33 | +For more information, check out the [latest release on GitHub](https://github.com/vllm-project/llm-compressor/releases/latest). |
22 | 34 |
|
23 | 35 | ## Key Features
|
24 | 36 |
|
|
0 commit comments