[MagpieTTS][TTS] Streaming Algorithm for MagpieTTS to 2508 #14573

subhankar-ghosh · 2025-08-25T15:12:38Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add streaming algorithm to magpietts

Collection: [Note which collection this PR will affect]

TTS

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

python scripts/magpietts/infer_and_evaluate_streaming.py \
--checkpoint_files ${CKPT} \
--hparams_files ${HPARAM} \
--codecmodel_path ${CODEC} \
--out_dir ${OUT_DIR} \
--datasets ${DATASET} \
--use_cfg \
--disable_fcd \
--apply_attention_prior

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: subhankar-ghosh <[email protected]>

…x attn Signed-off-by: subhankar-ghosh <[email protected]>

Signed-off-by: subhankar-ghosh <[email protected]>

…pietts_2503_small

Signed-off-by: subhankar-ghosh <[email protected]>

Signed-off-by: Subhankar Ghosh <[email protected]>

Signed-off-by: subhankar-ghosh <[email protected]>

…x attn Signed-off-by: subhankar-ghosh <[email protected]>

Signed-off-by: subhankar-ghosh <[email protected]>

Signed-off-by: Subhankar Ghosh <[email protected]>

Signed-off-by: subhankar-ghosh <[email protected]>

Signed-off-by: artbataev <[email protected]>

Signed-off-by: Subhankar Ghosh <[email protected]>

Signed-off-by: subhankar-ghosh <[email protected]>

Copilot

Pull Request Overview

This PR adds streaming inference functionality to MagpieTTS, enabling real-time text-to-speech generation by processing text input incrementally rather than all at once. The streaming algorithm maintains sliding windows for both text and audio history while managing attention priors to ensure coherent audio generation across text chunks.

Key Changes:

Implementation of streaming inference algorithm with windowing mechanisms for text and audio tokens
Addition of specialized attention prior handling for streaming mode with exponential weight support
Extraction of common argument parsing functionality to support both streaming and non-streaming inference scripts

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
scripts/magpietts/infer_and_evaluate_streaming.py	New streaming inference script with chunked text processing and windowed generation
nemo/collections/tts/models/magpietts.py	Core streaming methods including windowed text processing and streaming-specific attention prior construction
scripts/magpietts/infer_and_evaluate.py	Refactored to extract common argument parsing logic and removed combined violin plot functionality
scripts/magpietts/README.md	Added documentation and usage example for the new streaming inference capability

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

nemo/collections/tts/models/magpietts.py

Copilot · 2025-08-25T16:47:18Z

nemo/collections/tts/models/magpietts.py

+        It also uses a end_of_text flag to indicate whether the text has ended.
+        It also uses a left_offset to account for the fact that the text is not provided in a single chunk.


The docstring repeats the explanation about left_offset twice. The last two sentences are redundant and should be consolidated.

Suggested change

It also uses a end_of_text flag to indicate whether the text has ended.

It also uses a left_offset to account for the fact that the text is not provided in a single chunk.

It also uses an end_of_text flag to indicate whether the text has ended.

nemo/collections/tts/models/magpietts.py

Signed-off-by: Jason <[email protected]>

Co-authored-by: Copilot <[email protected]> Signed-off-by: Subhankar Ghosh <[email protected]>

rfejgin · 2025-08-27T22:47:39Z

Hi Subhankar, nice work!

I have not dived into all the details but here are a few things that come to mind:

There seems to be some code duplication between the streaming and non-streaming version. I wonder if it will become hard to maintain over time. Specifically, in:

infer_and_evaluate_streaming.py vs infer_and_evaluate.py - things like setting up checkpoint name, logging of metrics, etc. I do see that there is reuse certain functions from infer_and_evaluate.py but maybe there is more commonality to extract?
in magpietts.py does construct_streaming_inference_prior() have major differences (that we actively use) from construct_streaming_prior() aside from including the offset?

I know that there is a tradeoff between eliminating code duplication vs making unified code overly complex, but maybe the above are worth another look?

A README or pointer to documentation on the design of the streaming algorithm would be of interest since it's non trivial.
More of a minor point, but I wonder if logic that needs to know model-specific details like creating a BOS token would be better to put inside the MagpieTTSModel class (accessible to the external script via some API). That way it would also be easier for it to be reused when folks use infer_batch() directly (not through infer_and_evaluate_streaming.py), which is what I believe they do in Riva.

subhankar-ghosh and others added 19 commits July 30, 2025 20:12

Add small magpie config, inference changes

bbc3fcf

Bug fix

f29ab06

Inference and Evaluation cleanup and standardization

469b044

Signed-off-by: subhankar-ghosh <[email protected]>

Bug fixes and incorporating reviews

69a7d54

Signed-off-by: subhankar-ghosh <[email protected]>

Add toggle between training and inference for applying attn prior to …

e6a7b73

…x attn Signed-off-by: subhankar-ghosh <[email protected]>

Using pytorch training flag

9d558bb

Signed-off-by: subhankar-ghosh <[email protected]>

Merge branch 'magpietts_2503' into magpitts_2503_small

8825310

adding back an4 dataset for CI

d57bf10

Signed-off-by: subhankar-ghosh <[email protected]>

Merge branch 'magpitts_2503_small' of github.com:NVIDIA/NeMo into mag…

4001ce3

…pietts_2503_small

adding streaming algorithm for magpietts

adac362

Signed-off-by: subhankar-ghosh <[email protected]>

Update infer_and_evaluate.py to remove combined plot

bec2bab

Signed-off-by: Subhankar Ghosh <[email protected]>

Add small magpie config, inference changes

db23dc2

Inference and Evaluation cleanup and standardization

79677ce

Signed-off-by: subhankar-ghosh <[email protected]>

Bug fixes and incorporating reviews

23cd78c

Signed-off-by: subhankar-ghosh <[email protected]>

Add toggle between training and inference for applying attn prior to …

858d43c

…x attn Signed-off-by: subhankar-ghosh <[email protected]>

Using pytorch training flag

47c2e24

Signed-off-by: subhankar-ghosh <[email protected]>

adding streaming algorithm for magpietts

c66534f

Signed-off-by: subhankar-ghosh <[email protected]>

Update infer_and_evaluate.py to remove combined plot

fd4bbea

Signed-off-by: Subhankar Ghosh <[email protected]>

rebase 2503

4cfc0e2

Signed-off-by: subhankar-ghosh <[email protected]>

subhankar-ghosh requested review from rfejgin, XuesongYang, blisc, shehzeen and paarthneekhara August 25, 2025 15:12

github-actions bot added the TTS label Aug 25, 2025

subhankar-ghosh and others added 4 commits August 25, 2025 08:22

small merging bug fix

658da2e

Signed-off-by: subhankar-ghosh <[email protected]>

small merging bug fix

12fb582

Signed-off-by: subhankar-ghosh <[email protected]>

merge bug - deduplicate some lines

e17d58a

Signed-off-by: subhankar-ghosh <[email protected]>

Apply isort and black reformatting

7b71ead

Signed-off-by: subhankar-ghosh <[email protected]>

github-actions bot added the common label Aug 25, 2025

artbataev and others added 2 commits August 25, 2025 15:33

Apply isort and black reformatting

6d6459c

Signed-off-by: artbataev <[email protected]>

Merge branch 'magpietts_2508' into magpitts_2503_small

70bed3c

Signed-off-by: Subhankar Ghosh <[email protected]>

github-actions bot removed the common label Aug 25, 2025

Apply isort and black reformatting

bfdac10

Signed-off-by: subhankar-ghosh <[email protected]>

XuesongYang added the Run CICD label Aug 25, 2025

github-actions bot removed the Run CICD label Aug 25, 2025

XuesongYang requested a review from Copilot August 25, 2025 16:45

Copilot AI reviewed Aug 25, 2025

View reviewed changes

blisc and others added 2 commits August 25, 2025 13:32

Merge branch 'magpietts_2508' into magpitts_2503_small

bf17166

Signed-off-by: Jason <[email protected]>

Update nemo/collections/tts/models/magpietts.py

fd25ad3

Co-authored-by: Copilot <[email protected]> Signed-off-by: Subhankar Ghosh <[email protected]>

XuesongYang added the Run CICD label Aug 25, 2025

XuesongYang temporarily deployed to test August 25, 2025 18:40 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Aug 26, 2025

Merge branch 'magpietts_2508' into magpitts_2503_small

13450f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MagpieTTS][TTS] Streaming Algorithm for MagpieTTS to 2508 #14573

[MagpieTTS][TTS] Streaming Algorithm for MagpieTTS to 2508 #14573

Uh oh!

subhankar-ghosh commented Aug 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Aug 25, 2025

Uh oh!

Uh oh!

rfejgin commented Aug 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

		It also uses a end_of_text flag to indicate whether the text has ended.
		It also uses a left_offset to account for the fact that the text is not provided in a single chunk.

	It also uses a end_of_text flag to indicate whether the text has ended.
	It also uses a left_offset to account for the fact that the text is not provided in a single chunk.
	It also uses an end_of_text flag to indicate whether the text has ended.

[MagpieTTS][TTS] Streaming Algorithm for MagpieTTS to 2508 #14573

Are you sure you want to change the base?

[MagpieTTS][TTS] Streaming Algorithm for MagpieTTS to 2508 #14573

Uh oh!

Conversation

subhankar-ghosh commented Aug 25, 2025

What does this PR do ?

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes:

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rfejgin commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rfejgin commented Aug 27, 2025 •

edited

Loading