Add a streaming API for VoxCPM #26

AbrahamSanders · 2025-09-19T21:21:47Z

Hey, thanks for the great work!

This PR addresses #8, #21 by adding a streaming API to the VoxCPM and VoxCPMModel classes.

model.generate continues to output a complete audio, while model.generate_streaming yields a generator that returns a single 0.08s chunk at every generation step.

To ensure streaming is smooth, the last 3 latents are passed to the audio VAE for context and the last 1280 sample chunk is returned. 3 latents seemed to be the minimum context needed to remove audible artifacts from the streaming audio (I determined this by ear, feel free to adjust if you feel it's not enough)

import numpy as np
import soundfile as sf
from voxcpm import VoxCPM
from transformers.trainer_utils import set_seed

model = VoxCPM.from_pretrained("openbmb/VoxCPM-0.5B")
text = "Streaming text to speech is easy with VoxCPM!"

# Non-streaming
set_seed(42)
final = model.generate(text)
sf.write("output.wav", final, model.tts_model.sample_rate)

# Streaming
set_seed(42)
chunks = []
for chunk in model.generate_streaming(text):
    chunks.append(chunk)
final = np.concatenate(chunks)
sf.write("output_streaming.wav", final, model.tts_model.sample_rate)

The two results should be audibly identical!

Copilot

Pull Request Overview

This PR adds a streaming API to the VoxCPM text-to-speech model, allowing users to receive audio chunks incrementally during generation instead of waiting for the complete audio. The streaming implementation maintains audio quality by using overlapping context between chunks.

Introduces generate_streaming methods that yield audio chunks as generators
Refactors existing generation methods to use internal _generate implementations that support both streaming and non-streaming modes
Adds streaming support across the entire inference pipeline from the core VoxCPM class down to the model internals

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

File	Description
src/voxcpm/model/voxcpm.py	Refactors generation and inference methods to support streaming, adds context management for smooth audio chunks
src/voxcpm/core.py	Updates VoxCPM class to expose streaming API and handle numpy array conversion for streaming chunks
README.md	Adds documentation and example usage for the new streaming functionality

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-22T05:50:29Z

src/voxcpm/model/voxcpm.py

+    def optimize(self, disable: bool = False):
        try:
+            if disable:
+                raise ValueError("Optimization disabled by user")


The error message 'Optimization disabled by user' is misleading. This isn't an error condition but expected behavior when disable=True. Consider removing the artificial ValueError and simply returning early, or use a more descriptive message like 'Optimization skipped as requested'.

Suggested change

raise ValueError("Optimization disabled by user")

print("Optimization skipped as requested")

return self

Copilot · 2025-09-22T05:50:30Z

src/voxcpm/model/voxcpm.py

+                # return the last three predicted latent features to provide enough context for smooth decoding
+                pred_feat_chunk = torch.cat(pred_feat_seq[-3:], dim=1)


The magic number 3 for context features is hardcoded but mentioned in the PR description as adjustable. Consider making this a configurable parameter or at least extracting it to a named constant like STREAMING_CONTEXT_SIZE = 3 for better maintainability.

Copilot · 2025-09-22T05:50:30Z

src/voxcpm/model/voxcpm.py

+                patch_len = self.patch_size * self.chunk_size
+                for latent_pred, _ in inference_result:
+                    decode_audio = self.audio_vae.decode(latent_pred.to(torch.float32))
+                    decode_audio = decode_audio[..., -patch_len:].squeeze(1).cpu()


The patch_len calculation and slicing logic [..., -patch_len:] is duplicated in multiple places (lines 296, 509). Consider extracting this into a helper method to reduce code duplication and improve maintainability.

Copilot · 2025-09-22T05:50:30Z

src/voxcpm/core.py

            zipenhancer_model_path: ModelScope acoustic noise suppression model
                id or local path. If None, denoiser will not be initialized.
            enable_denoiser: Whether to initialize the denoiser pipeline.
+            optimize: Whether to optimize the model with torch.compile. True by default, but can be disabled for debugging.


The parameter description mentions 'torch.compile' but the actual optimization logic in the model uses different mechanisms (torch_tensorrt, torch2trt). The documentation should accurately describe what optimization actually does or be more generic.

Suggested change

optimize: Whether to optimize the model with torch.compile. True by default, but can be disabled for debugging.

optimize: Whether to optimize the model for inference speed. True by default, but can be disabled for debugging.

AbrahamSanders added 2 commits September 19, 2025 16:56

Add a streaming API for VoxCPM

5c5da0d

Update readme with streaming example

89f4d91

AbrahamSanders mentioned this pull request Sep 20, 2025

流式输出问题 #21

Closed

a710128 requested review from Copilot and liuxin99 September 22, 2025 05:49

Copilot AI reviewed Sep 22, 2025

View reviewed changes

liuxin99 merged commit b0714ad into OpenBMB:main Sep 22, 2025

dignome mentioned this pull request Oct 12, 2025

VoxCPM Streaming #64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a streaming API for VoxCPM #26

Add a streaming API for VoxCPM #26

Uh oh!

AbrahamSanders commented Sep 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 22, 2025

Uh oh!

Copilot AI Sep 22, 2025

Uh oh!

Copilot AI Sep 22, 2025

Uh oh!

Copilot AI Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	raise ValueError("Optimization disabled by user")
	print("Optimization skipped as requested")
	return self

		# return the last three predicted latent features to provide enough context for smooth decoding
		pred_feat_chunk = torch.cat(pred_feat_seq[-3:], dim=1)

	optimize: Whether to optimize the model with torch.compile. True by default, but can be disabled for debugging.
	optimize: Whether to optimize the model for inference speed. True by default, but can be disabled for debugging.

Add a streaming API for VoxCPM #26

Add a streaming API for VoxCPM #26

Uh oh!

Conversation

AbrahamSanders commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AbrahamSanders commented Sep 19, 2025 •

edited

Loading