Fix memory spikes #164

Blaizzy · 2025-05-18T21:55:01Z

Description:

Users running iterative or long text-to-speech generation tasks with the Kokoro and other models could experience significantly high and growing memory usage. This was primarily due to the MLX framework's internal cache accumulating tensor data (like KV caches and activations) across multiple generation segments or iterations without being cleared. In testing scenarios, the MLX cache memory was observed to grow to ~70-73 GB and remain high, leading to a substantial memory footprint for the overall Python process.

Solution ✅

This PR addresses the high memory consumption by strategically calling mx.core.clear_cache() within the Model.generate method, after each audio segment is produced. This ensures that memory allocated by the MLX cache for one segment is released before processing the next, preventing uncontrolled growth.

Impact & Results:

With this change, the MLX cache memory is effectively managed and does not accumulate across segments. This leads to a drastically reduced and stable memory footprint for applications using the Kokoro TTS model.

This demonstrates a reduction in per-iteration cache memory from tens of gigabytes to a few megabytes, making the model significantly more memory-efficient for sequential generation tasks.

Considerations:

While clearing the cache prevents excessive memory usage, it means that MLX cannot reuse computations cached from previous segments. This might have a minor performance impact on per-segment generation speed. However, for typical TTS applications where memory stability and the ability to process long texts are crucial, the benefits of reduced memory usage outweigh this potential trade-off.

How to Verify:

Run a script that iteratively calls tts.generate() (or a function wrapping it) for multiple text inputs or segments. Monitor mx.core.get_cache_memory() and overall process memory (e.g., using psutil). The cache memory should remain low and stable after each generation call, and the overall process memory should not grow uncontrollably due to MLX cache accumulation.

Checklist

Tests added/updated
Documentation updated
Issue referenced (e.g., Closes Model type kokoro not supported #123)

…ix-memory-spike

Blaizzy added 2 commits May 18, 2025 21:51

fix kokoro memory spike

3d6dc5b

add comment

a98bcbd

Blaizzy changed the title ~~Pc/fix memory spike~~ Fix memory spikes May 18, 2025

Blaizzy added 3 commits May 19, 2025 00:41

add clear cache to all models

4868687

add resource usage for STT

0c55a1b

Merge branch 'main' of https://github.com/Blaizzy/mlx-audio into pc/f…

cf69d4a

…ix-memory-spike

This was linked to issues May 18, 2025

Memory Usage in Kokoro #54

Closed

Voices not unloaded dynamically with mlx_audio.server #160

Closed

Blaizzy mentioned this pull request May 18, 2025

Memory Usage in Kokoro #54

Closed

Blaizzy marked this pull request as ready for review May 18, 2025 23:49

add voice reset for kokoro

7538527

Blaizzy merged commit a01642e into main May 19, 2025
2 checks passed

This was referenced May 19, 2025

Fix swift memory spikes #165

Merged

Crash when running Dia-1.6B #101

Open

Blaizzy mentioned this pull request May 26, 2025

Any way to Speed up short TTS? #181

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix memory spikes #164

Fix memory spikes #164

Uh oh!

Blaizzy commented May 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix memory spikes #164

Fix memory spikes #164

Uh oh!

Conversation

Blaizzy commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Solution ✅

Impact & Results:

Considerations:

How to Verify:

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Blaizzy commented May 18, 2025 •

edited

Loading