fix: British English pronunciation corrections (NURSE vowel, stress, yod, French loanwords)#15
Open
yocontra wants to merge 2 commits intohans00:mainfrom
Open
fix: British English pronunciation corrections (NURSE vowel, stress, yod, French loanwords)#15yocontra wants to merge 2 commits intohans00:mainfrom
yocontra wants to merge 2 commits intohans00:mainfrom
Conversation
When ipa-dict provides multiple pronunciation variants for a word (e.g. "caught" → /ˈkɑt/, /ˈkɔt/), prefer the variant containing ɔ (THOUGHT vowel) over ɑ (LOT vowel). This better represents standard American English pronunciation for THOUGHT-class words like caught, bought, law, fall, walk, want, etc.
…yod, French loanwords) Add fixBritishDict() post-processing function to build-dict.ts that applies RP pronunciation fixes when building the en-gb dictionary: 1. NURSE vowel: əː → ɜː (not a real English phoneme) 2. French -age loanwords: final ʤ → ʒ (garage, massage, etc.) 3. -arily adverb stress: add secondary stress on penultimate -ɛɹ- 4. -ormation derivatives: preserve /ɔː/ in -form- syllable 5. Yod insertion after /s/: suː → sjuː (suit, super, etc.) Also adds a GB dictionary build step that derives en-gb/dict.json from the same en_US IPA source with these fixes applied.
yocontra
added a commit
to yocontra/phonemize
that referenced
this pull request
Mar 15, 2026
Merges the following PRs into a single combined branch: - hans00#11: fix: prefer THOUGHT vowel (ɔ) variant in dictionary - hans00#12: fix: use STRUT vowel (ʌ) instead of schwa (ə) in stressed syllables - hans00#13: fix: distinguish NURSE vowel (ɜː) from unstressed ɚ, add linking ɹ - hans00#14: feat: add vowel length marks (ː) to IPA output - hans00#15: fix: British English pronunciation corrections
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
fixBritishDict()post-processing function toscripts/build-dict.tsthat corrects systematic RP (Received Pronunciation) errors when building the British English dictionary. Also adds a build step that producesdata/en-gb/dict.jsonfrom the sameen_USIPA source with these fixes applied.Pronunciation Fixes
əːused in dictionary entriesəː→ɜː/nəːs/→/nɜːs//dʒ/usedʤ→ʒfor French borrowings/ɡæɹɑːʤ/→/ɡæɹɑːʒ/ˌon penultimate-ɛɹ-/pɹaɪmɛɹɪli/→/pɹaɪˌmɛɹɪli/-form-vowel reduced to/ə//ɔː/in-form-syllable/ɪnfəɹmeɪʃən/→/ɪnfɔːɹmeɪʃən//suː/)/j/before/uː/after/s//suːt/→/sjuːt/Rationale
1. NURSE vowel (
əː→ɜː)əːis not a recognized phoneme in any English dialect. The NURSE vowel is universally transcribed as/ɜː/in IPA — the open-mid central unrounded vowel. The source dictionary's use ofəːappears to be an error.2. French -age loanwords (final
ʤ→ʒ)RP consistently uses the fricative
/ʒ/(not the affricate/dʒ/) for French borrowings ending in -age: garage, massage, barrage, camouflage, sabotage, espionage, reportage, decoupage, persiflage, badinage. This is one of the most recognizable BrE/AmE pronunciation differences.3. -arily adverb stress
BrE places secondary stress on the penultimate syllable of -arily adverbs (e.g. primarily, necessarily), whereas AmE stresses the first syllable. The fix adds a secondary stress marker
ˌbefore theɛɹsequence for the 10 most common words in this class.4. -ormation derivatives (
/ə/→/ɔː/)BrE preserves the full
/ɔː/vowel in the -form- syllable of words like information, transformation, confirmation, rather than reducing it to schwa as in casual AmE.5. Yod insertion after /s/ (
/suː/→/sjuː/)BrE retains the palatal glide
/j/before/uː/after/s/in words like sue, suit, super, superb, superior. This is part of the broader BrE yod-retention pattern.Changes
scripts/build-dict.ts: AddedfixBritishDict(dict)function (with detailed comments) and a new build block that producesdata/en-gb/dict.json