MARS6: A Small & Robust Hierarchical-Codec Text-to-Speech Model

MARS6 is a compact autoregressive TTS system based on a hierarchical neural audio codec. It uses a two-stage decoder to predict coarse-to-fine tokens at only 12 Hz, leading to fast inference while preserving high audio quality and speaker similarity. MARS6 achieves robust zero-shot voice cloning and expressive synthesis even on challenging in-the-wild references. This repository provides a lightweight implementation for inference using our public turbo checkpoints.

Model Architecture

Below is a high-level diagram of MARS6 from our paper. The encoder processes the text and speaker embeddings (from an external speaker encoder), producing a sequence of latent features. The hierarchical decoder operates at a low 12 Hz "global" level while autoregressively expanding each frame into multiple discrete codec tokens with a small "local" decoder.

Links

Project Page: MARS6-Turbo Demo & Samples
arXiv Paper: MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model
Website: https://camb.ai

Quick Start Guide

This section outlines how to install and run MARS6 for inference. You can either clone this repository and install dependencies or load MARS6 directly via Torch Hub.

Installation

Clone this repo:

git clone https://github.com/Camb-ai/mars6-turbo.git
cd mars6-turbo

Install required dependencies:
```
pip install snac msclap ipykernel iprogress
```
(Make sure you also have a modern version of Python, e.g. 3.9+. Best practice to use a conda environment or a python venv)

Model Inference

Use inference.py for direct usage:

python inference.py --audio "referencepath.wav" --save_path "outputpath.wav" --text "Text we wish to output. All right here!" --transcript "Transcript of the reference. For if you wish to deep clone"

OR MARS6_turbo_inference_demo.ipynb for a jupyter notebook.

Acknowledgements

We use minBPE by Karpathy for byte-pair tokenization utilities.
We were inspired by ideas from MEGABYTE for multi-scale token processing.
We leverage SNAC for the discrete audio codec.
WAVLM and CLAP are utilised for speaker information embedding.
Additional TTS references and techniques from VALL-E, StyleTTS2, XTTSv2, and more (see paper).

Thank you to the authors of these amazing works above for making this model possible!

Contributions

We welcome any contributions to improving the model. We'd also love to see how you used MARS6-Turbo in different scenarios, please use the 🙌 Show and tell category in Discussions to share your examples.

Contribution format:

The preferred way to contribute to our repo is to fork the master repository on GitHub:

Fork the repo on GitHub
Clone the repo, set upstream as this repo: git remote add upstream [email protected]:Camb-ai/mars6-turbo.git
Make a new local branch and make your changes, commit changes.
Push changes to new upstream branch: git push --set-upstream origin <NAME-NEW-BRANCH>
On GitHub, go to your fork and click 'Pull Request' to begin the PR process. Please make sure to include a description of what you did/fixed.

Join Our Team

We're an ambitious team, globally distributed, with a singular aim of making everyone's voice count. At CAMB.AI, we're a research team of Interspeech-published, Carnegie Mellon, ex-Siri engineers and we're looking for you to join our team.

We're actively hiring; please drop us an email at [email protected] if you're interested. Visit our careers page for more info.

Community

Join CAMB.AI community on Forum and Discord to share any suggestions, feedback, or questions with our team.

Support Camb.ai on Ko-fi ❤️!

Citation

If you use this repository or MARS6 in your research, please cite our paper with the following:

@inproceedings{mars6-2025icassp,
  author    = {Baas, Matthew and Scholtz, Pieter and Mehta, Arnav and Dyson, Elliott and Prakash, Akshat and Kamper, Herman},
  title     = {{MARS6}: A Small and Robust Hierarchical-Codec Text-to-Speech Model},
  booktitle = {IEEE ICASSP},
  year      = {2025},
  doi       = {10.1234/icassp.2025.mars6}
}

Thank you for trying MARS6! For issues, please open a GitHub ticket

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
mars6_turbo		mars6_turbo
MARS6_turbo_inference_demo.ipynb		MARS6_turbo_inference_demo.ipynb
README.md		README.md
hubconf.py		hubconf.py
inference.py		inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MARS6: A Small & Robust Hierarchical-Codec Text-to-Speech Model

Model Architecture

Links

Quick Start Guide

Installation

Model Inference

Acknowledgements

Contributions

Join Our Team

Community

Support Camb.ai on Ko-fi ❤️!

Citation

About

Uh oh!

Releases 1

Packages

Languages

Camb-ai/mars6-turbo

Folders and files

Latest commit

History

Repository files navigation

MARS6: A Small & Robust Hierarchical-Codec Text-to-Speech Model

Model Architecture

Links

Quick Start Guide

Installation

Model Inference

Acknowledgements

Contributions

Join Our Team

Community

Support Camb.ai on Ko-fi ❤️!

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages