fix: add surrogate pair support for font substitution and text encoding #3133

romen2232 · 2025-03-25T14:52:06Z

Overview

This PR adds support for handling surrogate pairs (characters outside the Basic Multilingual Plane) in font substitution and text encoding. This improves the library's ability to handle modern Unicode characters correctly.

Context

In our project, we encountered issues when rendering mathematical symbols and special characters. For example, when attempting to display mathematical italic characters like 𝑥, the text rendering would break because these characters are represented as surrogate pairs in Unicode. The existing implementation didn't properly handle these cases, leading to incorrect rendering (each character in the pair was rendered as a different character).

Changes

Font Substitution Engine

Updated the font substitution engine to properly handle surrogate pairs by using codePointAt() instead of character-by-character iteration
Added proper character length calculation for surrogate pairs (2 positions) vs regular characters (1 position)
Improved run splitting logic to maintain correct character boundaries

AFM Font Encoding

Modified the encodeText method in AFMFont to handle surrogate pairs correctly
Updated glyphsForString to use codePointAt() for proper Unicode character handling
Improved character iteration logic to account for surrogate pair lengths

Testing

Added comprehensive test coverage for surrogate pair handling:

Basic surrogate pair handling in text
Multiple surrogate pairs in sequence
Surrogate pairs interspersed with regular text
String conversion utilities for surrogate pairs
Mixed regular and surrogate pair code points

Testing

The changes include extensive test coverage in:

packages/textkit/tests/engines/fontSubstitution.test.ts
packages/textkit/tests/utils/stringFromCodePoints.test.ts

Test cases cover various scenarios, including:

Single emoji characters
Multiple emojis in sequence
Mixed content with regular text and emojis
Edge cases with different Unicode ranges

Impact

This change improves the library's ability to handle modern Unicode text, particularly:

Emoji characters
Non-BMP Unicode characters
Mixed content with both regular text and special characters

Breaking Changes

None. This is a backward-compatible enhancement that improves existing functionality.

changeset-bot · 2025-03-25T14:52:10Z

⚠️ No Changeset found

Latest commit: 16aa4b4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

…ing conversion

…nt substitution

romenmedina-aircury · 2025-03-27T19:32:58Z

Relevant issue
#3127

romen2232 added 2 commits March 25, 2025 15:57

test: add surrogate pair handling tests for font substitution and str…

70a7d66

…ing conversion

fix: update text encoding to handle surrogate pairs in AFMFont and fo…

0d52cad

…nt substitution

romen2232 force-pushed the master branch from de7f7ec to 0d52cad Compare March 25, 2025 15:20

refactor: add getUTF16Increment utility and corresponding tests

16aa4b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: add surrogate pair support for font substitution and text encoding #3133

fix: add surrogate pair support for font substitution and text encoding #3133

Uh oh!

romen2232 commented Mar 25, 2025

Uh oh!

changeset-bot bot commented Mar 25, 2025 •

edited

Loading

Uh oh!

romenmedina-aircury commented Mar 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: add surrogate pair support for font substitution and text encoding #3133

Are you sure you want to change the base?

fix: add surrogate pair support for font substitution and text encoding #3133

Uh oh!

Conversation

romen2232 commented Mar 25, 2025

Overview

Context

Changes

Font Substitution Engine

AFM Font Encoding

Testing

Testing

Impact

Breaking Changes

Uh oh!

changeset-bot bot commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

romenmedina-aircury commented Mar 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Mar 25, 2025 •

edited

Loading