Skip to content

Commit 94ef00d

Browse files
committed
Small changes for docs build
Signed-off-by: Aidan Reilly <[email protected]>
1 parent 0a20392 commit 94ef00d

File tree

4 files changed

+76
-15
lines changed

4 files changed

+76
-15
lines changed

.gitignore

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,6 @@ instance/
9393
# Scrapy stuff:
9494
.scrapy
9595

96-
# Sphinx documentation
97-
docs/_build/
98-
9996
# PyBuilder
10097
target/
10198

@@ -129,6 +126,7 @@ venv.bak/
129126

130127
# mkdocs documentation
131128
/site
129+
docs/.cache/
132130

133131
# mypy
134132
.mypy_cache/

docs/Makefile

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Minimal mkdocs makefile
2+
3+
PYTHON := python3
4+
MKDOCS_CMD := mkdocs
5+
MKDOCS_CONF := ../mkdocs.yml
6+
7+
.PHONY: help install serve build clean
8+
9+
help:
10+
@echo "Available targets:"
11+
@echo " install Install dependencies globally"
12+
@echo " serve Serve docs locally"
13+
@echo " build Build static site"
14+
@echo " clean Remove build artifacts"
15+
16+
install:
17+
pip install -e "../[dev]"
18+
19+
serve:
20+
$(MKDOCS_CMD) serve --livereload -f $(MKDOCS_CONF)
21+
22+
build:
23+
$(MKDOCS_CMD) build -f $(MKDOCS_CONF)
24+
25+
clean:
26+
rm -rf site/ .cache/

docs/README.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Getting started with LLM Compressor docs
2+
3+
```bash
4+
cd docs
5+
```
6+
7+
- Install the dependencies:
8+
9+
```bash
10+
make install
11+
```
12+
13+
- Clean the previous build (optional but recommended):
14+
15+
```bash
16+
make clean
17+
```
18+
19+
- Serve the docs:
20+
21+
```bash
22+
make serve
23+
```
24+
25+
This will start a local server at http://localhost:8000. You can now open your browser and view the documentation.

docs/index.md

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,36 @@
1-
# Home
1+
# About LLM Compressor
22

3-
!!! info "New Feature: Axolotl Sparse Finetuning Integration"
4-
Easily finetune sparse LLMs through our seamless integration with Axolotl.
5-
[Learn more](https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor).
3+
**LLM Compressor** is an easy-to-use library for optimizing large language models for deployment with vLLM, enabling up to **5X faster, cheaper inference**. It provides a comprehensive toolkit for:
64

7-
!!! info "New Feature: AutoAWQ Integration"
8-
Perform low-bit weight-only quantization efficiently using AutoAWQ, now part of LLM Compressor. [Learn more](https://github.com/vllm-project/llm-compressor/pull/1177).
5+
- Applying a wide variety of compression algorithms, including weight and activation quantization, pruning, and more
6+
- Seamlessly integrating with Hugging Face Transformers, Models, and Datasets
7+
- Using a `safetensors`-based file format for compressed model storage that is compatible with `vLLM`
8+
- Supporting performant compression of large models via `accelerate`
99

1010
## <div style="display: flex; align-items: center;"><img alt="LLM Compressor Logo" src="assets/llmcompressor-icon.png" width="40" style="vertical-align: middle; margin-right: 10px" /> LLM Compressor</div>
1111

1212
<p align="center">
13-
<img alt="LLM Compressor Flow" src="assets/llmcompressor-user-flows.png" width="100%" style="max-width: 100%;"s />
13+
<img alt="LLM Compressor Flow" src="assets/llmcompressor-user-flows.png" width="100%" style="max-width: 100%;"/>
1414
</p>
1515

16-
**LLM Compressor** is an easy-to-use library for optimizing large language models for deployment with vLLM, enabling up to **5X faster, cheaper inference**. It provides a comprehensive toolkit for:
16+
## Recent Updates
1717

18-
- Applying a wide variety of compression algorithms, including weight and activation quantization, pruning, and more
19-
- Seamlessly integrating with Hugging Face Transformers, Models, and Datasets
20-
- Using a `safetensors`-based file format for compressed model storage that is compatible with `vLLM`
21-
- Supporting performant compression of large models via `accelerate`
18+
!!! info "Llama4 Quantization Support"
19+
Quantize a Llama4 model to [W4A16](examples/quantization_w4a16.md) or [NVFP4](examples/quantization_w4a16.md). The checkpoint produced can seamlessly run in vLLM.
20+
21+
!!! info "Large Model Support with Sequential Onloading"
22+
As of llm-compressor>=0.6.0, you can now quantize very large language models on a single GPU. Models are broken into disjoint layers which are then onloaded to the GPU one layer at a time. For more information on sequential onloading, see [Big Modeling with Sequential Onloading](examples/big_models_with_sequential_onloading.md) as well as the [DeepSeek-R1 Example](examples/quantizing_moe.md).
23+
24+
!!! info "Preliminary FP4 Quantization Support"
25+
Quantize weights and activations to FP4 and seamlessly run the compressed model in vLLM. Model weights and activations are quantized following the NVFP4 [configuration](https://github.com/neuralmagic/compressed-tensors/blob/f5dbfc336b9c9c361b9fe7ae085d5cb0673e56eb/src/compressed_tensors/quantization/quant_scheme.py#L104). See examples of [weight-only quantization](examples/quantization_w4a16_fp4.md) and [fp4 activation support](examples/quantization_w4a4_fp4.md). Support is currently preliminary and additional support will be added for MoEs.
26+
27+
!!! info "Updated AWQ Support"
28+
Improved support for MoEs with better handling of larger models
29+
30+
!!! info "Axolotl Sparse Finetuning Integration"
31+
Seamlessly finetune sparse LLMs with our Axolotl integration. Learn how to create [fast sparse open-source models with Axolotl and LLM Compressor](https://developers.redhat.com/articles/2025/06/17/axolotl-meets-llm-compressor-fast-sparse-open). See also the [Axolotl integration docs](https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor).
32+
33+
For more information, check out the [latest release on GitHub](https://github.com/vllm-project/llm-compressor/releases/latest).
2234

2335
## Key Features
2436

0 commit comments

Comments
 (0)