Skip to content

Conversation

@lucasnewman
Copy link
Collaborator

This allows reducing time-to-first-speech for OuteTTS, which can play in realtime using the 8-bit quantized model on my M3 Max.

mlx_lm.generate --model mlx-community/Llama-3.2-1B-Instruct-4bit --verbose false --temp 0 --max-tokens 512 --prompt "Write one sentence on wavelets." |\
python -m mlx_audio.tts.generate --model mlx-community/OuteTTS-1.0-0.6B-8bit --stream --verbose

@lucasnewman lucasnewman requested a review from Blaizzy May 19, 2025 23:29
@lin72h
Copy link

lin72h commented May 19, 2025

@lucasnewman Thanks for perfecting this, it's very usable

@Blaizzy
Copy link
Owner

Blaizzy commented May 19, 2025

Thanks @lucasnewman, this is awesome!

Could you try and add some overlap similar to the demo implementation we came up with a month ago?

I think it will help with the cuts and slight noise when going over disjoint segments. Also, not sure but the audio player might need an update to play with overlaps.

https://gist.github.com/Blaizzy/c18a52509fec3cbc5cbd64b07e33ca1c

@lucasnewman
Copy link
Collaborator Author

Thanks @lucasnewman, this is awesome!

Could you try and add some overlap similar to the demo implementation we came up with a month ago?

I think it will help with the cuts and slight noise when going over disjoint segments. Also, not sure but the audio player might need an update to play with overlaps.

https://gist.github.com/Blaizzy/c18a52509fec3cbc5cbd64b07e33ca1c

It decodes the entire segment for each chunk, so it’s technically like 100% overlap. I complied the decoder loop to mitigate the performance hit of the full decode for that reason, as Descript doesn’t really support “streaming” chunk-wise. If the audio player is running dry, try turning up the streaming interval — it’s device-specific and we’re not doing any ahead of time buffering, so if it’s slower than realtime it will always have gaps.

We could add some kind of configurable start delay to allow ahead-of-time buffering or try a simple heuristic to look at the samples/sec generation speed and buffer accordingly, but it gets complex quickly. I think it might be better handled as a follow up to enhance the audio player buffering capabilities.

Copy link
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Phenomenal work 🚀

@Blaizzy Blaizzy merged commit 1eb879e into Blaizzy:main May 24, 2025
2 checks passed
@lucasnewman lucasnewman deleted the outetts-streaming branch August 17, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants