Skip to content

Conversation

romen2232
Copy link

Overview

This PR adds support for handling surrogate pairs (characters outside the Basic Multilingual Plane) in font substitution and text encoding. This improves the library's ability to handle modern Unicode characters correctly.

Context

In our project, we encountered issues when rendering mathematical symbols and special characters. For example, when attempting to display mathematical italic characters like 𝑥, the text rendering would break because these characters are represented as surrogate pairs in Unicode. The existing implementation didn't properly handle these cases, leading to incorrect rendering (each character in the pair was rendered as a different character).

Changes

Font Substitution Engine

  • Updated the font substitution engine to properly handle surrogate pairs by using codePointAt() instead of character-by-character iteration
  • Added proper character length calculation for surrogate pairs (2 positions) vs regular characters (1 position)
  • Improved run splitting logic to maintain correct character boundaries

AFM Font Encoding

  • Modified the encodeText method in AFMFont to handle surrogate pairs correctly
  • Updated glyphsForString to use codePointAt() for proper Unicode character handling
  • Improved character iteration logic to account for surrogate pair lengths

Testing

Added comprehensive test coverage for surrogate pair handling:

  • Basic surrogate pair handling in text
  • Multiple surrogate pairs in sequence
  • Surrogate pairs interspersed with regular text
  • String conversion utilities for surrogate pairs
  • Mixed regular and surrogate pair code points

Testing

The changes include extensive test coverage in:

  • packages/textkit/tests/engines/fontSubstitution.test.ts
  • packages/textkit/tests/utils/stringFromCodePoints.test.ts

Test cases cover various scenarios, including:

  • Single emoji characters
  • Multiple emojis in sequence
  • Mixed content with regular text and emojis
  • Edge cases with different Unicode ranges

Impact

This change improves the library's ability to handle modern Unicode text, particularly:

  • Emoji characters
  • Non-BMP Unicode characters
  • Mixed content with both regular text and special characters

Breaking Changes

None. This is a backward-compatible enhancement that improves existing functionality.

Copy link

changeset-bot bot commented Mar 25, 2025

⚠️ No Changeset found

Latest commit: 16aa4b4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@romenmedina-aircury
Copy link

Relevant issue
#3127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants