-
-
Notifications
You must be signed in to change notification settings - Fork 236
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
I'm trying to incorporate MLX-Audio into AIChat that generates a conversation between two LLMs as speech.
However, memory consumption just goes up and up and up indefinitely as I generate more. I don't have this problem with kokoro-onnx.
I'm not sure if it's my code or MLX-audio, but it doesn't seem to release previous generation from memory after I append the samples to an array?
Here's the minimal code that I can use to produce.
from mlx_audio.tts.utils import load_model
import numpy as np
import codecs
import random
import json
import soundfile as sf
def random_pause(min_duration=0.5, max_duration=1.0, sample_rate=24000):
silence_duration = random.uniform(min_duration, max_duration)
silence = np.zeros(int(silence_duration * sample_rate))
return silence
def generate_audio(text, voice, l, r, speed=1.0, lang="a", sample_rate=24000):
res = tts.generate(
text=text,
voice=voice,
speed=speed,
lang_code=lang,
temperature=0.7,
verbose=False,
)
samples = list(res)[0].audio
pause = random_pause(sample_rate=sample_rate)
samples = np.concatenate([samples, pause])
samples = np.column_stack((samples*l, samples*r))
return samples
tts = load_model("mlx-community/Kokoro-82M-bf16")
a_voice= "af_sky"
b_voice = "af_heart"
chat = json.load(codecs.open("chat.json", "r", "utf-8"))
wavs = []
for i in range(0, len(chat), 2):
content = chat[i]["content"]
print(a_voice, content)
samples = generate_audio(content, a_voice, 0.8, 1.0)
wavs.append(samples)
content = chat[i+1]["content"]
print(b_voice, content)
samples = generate_audio(content, b_voice, 1.0, 0.8)
wavs.append(samples)
wav = np.concat(wavs)
sf.write(f"podcast.wav", wav, 24000)I'm also attaching the chat history that you can use to reproduce.
Thanks!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working