DiffSinger Refined Phonemizer Implementation #1840

overdramatic · 2025-11-23T14:05:25Z

DiffSinger Refined Phonemizer Implementation

Overview

This PR introduces the DiffSingerRefinedPhonemizer - an abstract base class based on DiffSingerBasePhonemizer and DiffSingerG2pPhonemizer, that provides comprehensive infrastructure for DiffSinger-based phonemizers in OpenUtau. The implementation delivers advanced phoneme processing capabilities with sophisticated phoneme replacement systems based on replacements lists, word-level, timing-based and duration-dependent modifications

Key Features

Advanced Phoneme Processing: Supports single, merge (M:1), split (1:M), and many-to-many phoneme replacements
Multi-language Support: Handles language-specific phoneme mappings and prefixes
Word-Level Editing: Enables cross-phoneme modifications across entire words
Comprehensive G2P System: Integrates grapheme-to-phoneme conversion with customizable dictionaries

Benefits

Production Ready: Comprehensive error handling and logging
High Performance: Optimized processing with caching
Flexible: Extensive customization points
Robust: Graceful fallbacks and validation
Maintainable: Clean architecture with clear separation of concerns

This implementation provides a solid foundation for sophisticated singing synthesis with DiffSinger, supporting complex linguistic phenomena while maintaining performance and extensibility.

Usage

Derive from DiffSingerRefinedPhonemizer and override:

EditPhonemesForWord() for word-level modifications
EditTimedPhonemes() for timing-based transformations
ApplyDurationBasedReplacements() for duration-dependent changes

For dsdict.yaml replacements:

replacements:
- {from: ng, to: pt/g} #Single Replacement
- {from [k, w] to: pt/kw} #Merge
- {from gw to: [pt/g, pt/w]} #Split
- {from: [k, w, an], to: [pt/s, pt/t, pt/r, pt/a]} #Many-to-Many

It uses the same rules as DiffSingerG2pPhonemizer with from being without the language tag and to with the tag if the model has multi-dict support
The phonemizer folows the replacements in logical order: single -> merge -> split -> many-to-many

- Add comprehensive G2P dictionary support with replacement types (single, merge, split, many-to-many) - Implement sophisticated phoneme validation and language prefix handling - Add timing-based phoneme processing with duration-dependent modifications - Include word-level phoneme editing capabilities for cross-phoneme modifications - Integrate ONNX machine learning models (linguistic and duration prediction) - Support multi-language models with language ID mapping - Add comprehensive speaker embedding management - Include tensor caching for performance optimization - Implement robust error handling and validation throughout - Add extensive XML documentation for all public APIs This phonemizer provides a robust foundation for DiffSinger voice synthesis with support for complex phoneme transformations and cross-note processing.

Refactored EditTimedPhonemes to accept next note and its first phoneme duration. Updated ProcessPart to collect phoneme timing data in two passes, allowing timing edits with access to neighboring note information

hecko-yes · 2025-11-23T19:33:32Z

Might it make more sense to reverse the order of replacements, or even use the order as listed in the .yaml file for maximum flexibility? For example, this isn't possible if you process single replacements before everything else:

replacements:
- {from: [s, p], to: [s, p_]}
- {from: p, to: p_h}

overdramatic · 2025-11-23T19:49:04Z

Might it make more sense to reverse the order of replacements, or even use the order as listed in the .yaml file for maximum flexibility? For example, this isn't possible if you process single replacements before everything else:
replacements:
- {from: [s, p], to: [s, p_]}
- {from: p, to: p_h}

I don't get what you are trying to say, but this base class follows the replacement order:
single -> merge -> split -> many-to-many

For more sophisticated or in depth customization, it's better to make inside the phonemizer code

overdramatic added 2 commits November 14, 2025 09:04

update: EditTimedPhonemes

b64ed07

Refactored EditTimedPhonemes to accept next note and its first phoneme duration. Updated ProcessPart to collect phoneme timing data in two passes, allowing timing edits with access to neighboring note information

overdramatic mentioned this pull request Nov 23, 2025

Add DiffSinger BRAPA Phonemizer and BRAPA G2P model #1841

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DiffSinger Refined Phonemizer Implementation #1840

DiffSinger Refined Phonemizer Implementation #1840

Uh oh!

overdramatic commented Nov 23, 2025

Uh oh!

hecko-yes commented Nov 23, 2025

Uh oh!

overdramatic commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DiffSinger Refined Phonemizer Implementation #1840

Are you sure you want to change the base?

DiffSinger Refined Phonemizer Implementation #1840

Uh oh!

Conversation

overdramatic commented Nov 23, 2025

DiffSinger Refined Phonemizer Implementation

Overview

Key Features

Benefits

Usage

Uh oh!

hecko-yes commented Nov 23, 2025

Uh oh!

overdramatic commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants