Skip to content

Conversation

@overdramatic
Copy link
Contributor

DiffSinger Refined Phonemizer Implementation

Overview

This PR introduces the DiffSingerRefinedPhonemizer - an abstract base class based on DiffSingerBasePhonemizer and DiffSingerG2pPhonemizer, that provides comprehensive infrastructure for DiffSinger-based phonemizers in OpenUtau. The implementation delivers advanced phoneme processing capabilities with sophisticated phoneme replacement systems based on replacements lists, word-level, timing-based and duration-dependent modifications

Key Features

  • Advanced Phoneme Processing: Supports single, merge (M:1), split (1:M), and many-to-many phoneme replacements
  • Multi-language Support: Handles language-specific phoneme mappings and prefixes
  • Word-Level Editing: Enables cross-phoneme modifications across entire words
  • Comprehensive G2P System: Integrates grapheme-to-phoneme conversion with customizable dictionaries

Benefits

  • Production Ready: Comprehensive error handling and logging
  • High Performance: Optimized processing with caching
  • Flexible: Extensive customization points
  • Robust: Graceful fallbacks and validation
  • Maintainable: Clean architecture with clear separation of concerns

This implementation provides a solid foundation for sophisticated singing synthesis with DiffSinger, supporting complex linguistic phenomena while maintaining performance and extensibility.

Usage

Derive from DiffSingerRefinedPhonemizer and override:

  • EditPhonemesForWord() for word-level modifications
  • EditTimedPhonemes() for timing-based transformations
  • ApplyDurationBasedReplacements() for duration-dependent changes

For dsdict.yaml replacements:

replacements:
- {from: ng, to: pt/g} #Single Replacement
- {from [k, w] to: pt/kw} #Merge
- {from gw to: [pt/g, pt/w]} #Split
- {from: [k, w, an], to: [pt/s, pt/t, pt/r, pt/a]} #Many-to-Many

It uses the same rules as DiffSingerG2pPhonemizer with from being without the language tag and to with the tag if the model has multi-dict support
The phonemizer folows the replacements in logical order: single -> merge -> split -> many-to-many

- Add comprehensive G2P dictionary support with replacement types (single, merge, split, many-to-many)
- Implement sophisticated phoneme validation and language prefix handling
- Add timing-based phoneme processing with duration-dependent modifications
- Include word-level phoneme editing capabilities for cross-phoneme modifications
- Integrate ONNX machine learning models (linguistic and duration prediction)
- Support multi-language models with language ID mapping
- Add comprehensive speaker embedding management
- Include tensor caching for performance optimization
- Implement robust error handling and validation throughout
- Add extensive XML documentation for all public APIs

This phonemizer provides a robust foundation for DiffSinger voice synthesis with support for complex phoneme transformations and cross-note processing.
Refactored EditTimedPhonemes to accept next note and its first phoneme duration.
Updated ProcessPart to collect phoneme timing data in two passes, allowing timing edits with access to neighboring note information
@hecko-yes
Copy link

Might it make more sense to reverse the order of replacements, or even use the order as listed in the .yaml file for maximum flexibility? For example, this isn't possible if you process single replacements before everything else:

replacements:
- {from: [s, p], to: [s, p_]}
- {from: p, to: p_h}

@overdramatic
Copy link
Contributor Author

Might it make more sense to reverse the order of replacements, or even use the order as listed in the .yaml file for maximum flexibility? For example, this isn't possible if you process single replacements before everything else:

replacements:
- {from: [s, p], to: [s, p_]}
- {from: p, to: p_h}

I don't get what you are trying to say, but this base class follows the replacement order:
single -> merge -> split -> many-to-many

For more sophisticated or in depth customization, it's better to make inside the phonemizer code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants