[TTS][Magpietts] Unify Longform and Standard Inference logic#15375
[TTS][Magpietts] Unify Longform and Standard Inference logic#15375subhankar-ghosh merged 26 commits intomainfrom
Conversation
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Refactors MagpieTTS inference to use a single “chunked” inference path for both short and long texts, with dataset-driven automatic sentence chunking based on per-sample language thresholds.
Changes:
- Introduces language-aware thresholding + unified
chunk_text_for_inference()chunking utility (replacing prior longform detection logic). - Replaces
LongFormTTSInferenceDatasetwithChunkedTTSInferenceDatasetand updates the inference runner to always use the unified multi/single-chunk loop. - Updates CLI/example script to remove explicit longform args and align with the unified inference flow.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
nemo/collections/tts/parts/utils/tts_dataset_utils.py |
Adds language-aware sentence splitting, thresholds, tokenizer mapping, and unified chunking helper. |
nemo/collections/tts/data/text_to_speech_dataset.py |
Replaces longform inference dataset with unified chunked inference dataset + mixed-chunk collation. |
nemo/collections/tts/modules/magpietts_inference/inference.py |
Removes standard/longform branching; always runs unified chunk loop via generate_speech(). |
nemo/collections/tts/models/magpietts.py |
Renames longform state/config to chunked equivalents; updates do_tts() to the unified chunked generation path. |
examples/tts/magpietts_inference.py |
Removes longform CLI controls and updates messaging for unified chunking behavior. |
tests/collections/tts/parts/utils/test_tts_dataset_utils.py |
Adds unit tests for new thresholds and unified chunking helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
|
The github UI still says that there are conflicts |
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
…o into magpietts_longform_unify
blisc
left a comment
There was a problem hiding this comment.
Please fix the linting errors
tests/functional_tests/L2_TTS_InferEvaluate_Magpietts_SeenSpeakers.sh
Outdated
Show resolved
Hide resolved
| --hparams_files /home/TestData/tts/2602_MagpieTTS/hprams_hi_char.yaml \ | ||
| --checkpoint_files /home/TestData/tts/2602_MagpieTTS/Magpie-TTS-ML-V1--val_cer_gt=0.3258-step=1000.ckpt \ |
There was a problem hiding this comment.
If this is the same checkpoint as tests/functional_tests/L2_TTS_InferEvaluate_Magpietts_SeenSpeakers.sh, we can remove this test. This test was only kept to test a different checkpoint
There was a problem hiding this comment.
We can remove this test now since it's identical to L2_TTS_InferEvaluate_Magpietts_SeenSpeakers.sh
tests/functional_tests/L2_TTS_InferEvaluatelongform_Magpietts_ZeroShot.sh
Outdated
Show resolved
Hide resolved
for PR it is important to merge main to this branch, as it is out of date.
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
|
Fixed the necessary linting issues, the current linting failures are due to CI issues. |
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
|
[🤖]: Hi @subhankar-ghosh 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
This pull request refactors and unifies the text chunking and inference logic for TTS (Text-to-Speech) in the MagpieTTS pipeline. The main change is the replacement of the previous "longform" inference logic with a new, language-aware, unified chunked inference path. This affects dataset preparation, model state management, argument parsing, and the inference runner, making the codebase simpler and more robust for both short and long texts.
Key changes:
Unified Inference and Text Chunking
examples/tts/magpietts_inference.py,nemo/collections/tts/data/text_to_speech_dataset.py,nemo/collections/tts/models/magpietts.py) [1] [2] [3]--longform_mode,--longform_word_threshold, etc.), simplifying the inference interface. (examples/tts/magpietts_inference.py) [1] [2] [3]Dataset and Collation Refactor
ChunkedTTSInferenceDataset(replacingLongFormTTSInferenceDataset) with per-sample, language-aware chunking and tokenizer selection. The dataset now automatically decides chunking strategy based on language and text length. (nemo/collections/tts/data/text_to_speech_dataset.py) [1] [2] [3]collate_fnto handle variable-length chunked batches, padding as needed, and to generalize beyond the previous longform-specific logic. (nemo/collections/tts/data/text_to_speech_dataset.py) [1] [2]Model and State Naming Consistency
LongformDecoderState→ChunkedDecoderState,LongformConfig→ChunkedInferenceConfig) throughout the model code for clarity and consistency with the new unified approach. (nemo/collections/tts/models/magpietts.py) [1] [2] [3] [4]_needs_longform_inferencemethod and all language threshold logic from the model, as chunking is now handled in a unified, language-aware way. (nemo/collections/tts/models/magpietts.py)Utility and Import Updates
nemo/collections/tts/models/magpietts.py,nemo/collections/tts/data/text_to_speech_dataset.py) [1] [2]These changes make the TTS inference pipeline easier to use and maintain, while improving support for multilingual and variable-length text inputs.
Collection: TTS
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information