CommVQ: Commutative Vector Quantization for KV Cache Compression

This repository contains the official implementation of CommVQ, a method for memory-efficient and long-context inference through KV cache quantization with learned codebooks. It achieves strong performance across a wide range of benchmarks while significantly reducing memory overhead.

News

[June, 2025]: Released code and model weights.
[May, 2025]: CommVQ is accepted to ICML 2025! See you in Vancouver, BC.

Model Checkpoints

We release the following LLaMA-3.1 8B checkpoints with CommVQ 1-bit and 2-bit compression. Both value codebooks and key codebooks are provided below. The value codebooks are used together with the original (unchanged) model weights.

Model Variant	Value Codebook	Key Codebook
LLaMA-3.1 8B + CommVQ 1-bit	🤗 Hugging Face	🤗 Hugging Face
LLaMA-3.1 8B + CommVQ 2-bit	🤗 Hugging Face	🤗 Hugging Face

Installation

conda create -n commvq python=3.10
conda activate commvq
pip install -e .
pip install flash-attn --no-build-isolation

Training

cd training

# Step 1: Collect KV cache
bash collect_kv.sh

# Step 2: Prepare scaling factors
python make_scale.py

# Step 3: Train the codebook for key cache
bash quantize_key_cache.sh

# Step 4: Train the codebook for value cache
bash finetune/llama3.1_8b_int1.sh

Evaluation

Longbench

cd evaluation/longbench
python pred.py --model $CHECKPOINT
python eval.py --model $RESULT_DIR

Infinitebench

cd evaluation/infiniteBench/src
# Download the evaluation datasets
bash scripts/download_dataset.sh
# Evaluate each tasks
bash run_passkey.sh
# Merge all results in each task into one jsonl file
cat ../results/commvq/preds_passkey_*.jsonl > ../results/commvq/preds_passkey.jsonl
# Compute the task score
python compute_scores.py --task all --model_name commvq

NIAH

cd evaluation/niah
bash run.sh $CHECKPOINT

Memory Measurement

We implement Triton-based kernels to further optimize memory usage and enable real memory savings with CommVQ. (Currently supports LLaMA-3.1 8B with 1-bit quantization; ongoing development for broader model support.)

cd evaluation/memory_measurement
pip install -e ../../transformers_triton_infer
bash eval_memory.sh $CHECKPOINT

Citation

If you find CommVQ useful in your research or applications, please consider citing:

@inproceedings{li2025commvq,
  title = {CommVQ: Commutative Vector Quantization for KV Cache Compression},
  author = {Junyan Li and Yang Zhang and Muhammad Yusuf Hassan and Talha Chafekar and Tianle Cai and Zhile Ren and Pengsheng Guo and Binazir Karimzadeh and Colorado J Reed and Chong Wang and Chuang Gan},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning (ICML)},
  year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
commvq		commvq
evaluation		evaluation
training		training
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CommVQ: Commutative Vector Quantization for KV Cache Compression

Table of Contents

News

Model Checkpoints

Installation

Training

Evaluation

Longbench

Infinitebench

NIAH

Memory Measurement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

UMass-Embodied-AGI/CommVQ

Folders and files

Latest commit

History

Repository files navigation

CommVQ: Commutative Vector Quantization for KV Cache Compression

Table of Contents

News

Model Checkpoints

Installation

Training

Evaluation

Longbench

Infinitebench

NIAH

Memory Measurement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages