Strip invisible Unicode from link hrefs for defense-in-depth#3299
Open
romanisa wants to merge 5 commits intomicrosoft:masterfrom
Open
Strip invisible Unicode from link hrefs for defense-in-depth#3299romanisa wants to merge 5 commits intomicrosoft:masterfrom
romanisa wants to merge 5 commits intomicrosoft:masterfrom
Conversation
Strip invisible Unicode characters (zero-width chars, bidirectional marks, Unicode Tags U+E0001-E007F, etc.) from link href attributes at multiple layers to prevent hidden content injection via mailto: links. Changes: - Add stripInvisibleUnicode utility in roosterjs-content-model-dom - Apply stripping in sanitizeElement.ts (HTML paste/sanitization path) - Apply stripping in checkXss.ts (programmatic link insertion path) - Apply stripping in linkFormatHandler.ts (DOM-to-model conversion path) - Apply stripping in linkProcessor.ts (DOM-to-model conditional check) - Add comprehensive unit tests for all changes Bug: https://outlookweb.visualstudio.com/Outlook%20Web/_workitems/edit/409639 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Strips invisible Unicode characters from link href values across sanitization, model conversion, and XSS checking to prevent hidden-content injection (notably via mailto:).
Changes:
- Introduces
stripInvisibleUnicode()utility and exports it fromroosterjs-content-model-dom. - Applies stripping during DOM→model conversion (
linkProcessor,linkFormatHandler) and HTML sanitization (sanitizeElement). - Updates
checkXss()to strip invisible Unicode before evaluatingscript:patterns and adds unit tests across layers.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/roosterjs-content-model-dom/lib/domUtils/stripInvisibleUnicode.ts | Adds the core stripping utility via a consolidated regex. |
| packages/roosterjs-content-model-dom/lib/index.ts | Exposes stripInvisibleUnicode from the package entrypoint. |
| packages/roosterjs-content-model-dom/lib/formatHandlers/segment/linkFormatHandler.ts | Strips invisibles when reading link formats from DOM attributes. |
| packages/roosterjs-content-model-dom/lib/domToModel/processors/linkProcessor.ts | Strips invisibles while processing <a href> into the content model. |
| packages/roosterjs-content-model-core/lib/command/createModelFromHtml/sanitizeElement.ts | Strips invisibles when sanitizing href attributes. |
| packages/roosterjs-content-model-dom/test/domUtils/stripInvisibleUnicodeTest.ts | Adds focused unit coverage for stripping behavior across many code points. |
| packages/roosterjs-content-model-dom/test/domToModel/processors/linkProcessorTest.ts | Ensures DOM→model link processing strips invisible Unicode in href. |
| packages/roosterjs-content-model-core/test/command/createModelFromHtml/sanitizeElementTest.ts | Ensures sanitizer strips invisibles from href but not unrelated attributes. |
| packages/roosterjs-content-model-api/lib/publicApi/utils/checkXss.ts | Strips invisibles before XSS detection and returns sanitized links. |
| packages/roosterjs-content-model-api/test/publicApi/utils/checkXssTest.ts | Adds tests for invisible Unicode stripping + script: obfuscation detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
packages/roosterjs-content-model-core/lib/command/createModelFromHtml/sanitizeElement.ts
Outdated
Show resolved
Hide resolved
packages/roosterjs-content-model-dom/lib/formatHandlers/segment/linkFormatHandler.ts
Outdated
Show resolved
Hide resolved
packages/roosterjs-content-model-core/lib/command/createModelFromHtml/sanitizeElement.ts
Outdated
Show resolved
Hide resolved
…t/linkFormatHandler.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…, use strict equality - Strip invisible Unicode from href BEFORE the script: regex check to prevent XSS bypass (e.g., s\u200Bcript: passing the check then being stripped to script:) - Guard against empty href after stripping in linkFormatHandler (only set format.href when sanitizedHref is non-empty) - Use strict equality (===) for attribute name checks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… comments Extend the stripped character set to include Mongolian free variation selectors (U+180B-180D), interlinear annotation anchors (U+FFF9-FFFB), and extended Unicode Tags (U+E0080-E00FF). Add defense-in-depth comments at each call site, document ZWJ/emoji and URL-encoding limitations, and add 4 new tests for the expanded character ranges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add decodeURIComponent before stripping so that URL-encoded invisible characters (e.g. %E2%80%8B for U+200B) are also caught. Falls back gracefully on malformed percent-encoding. Adds 5 new tests for URL-encoded scenarios. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Strips invisible Unicode characters (zero-width chars, bidirectional marks, Unicode Tags U+E0001-U+E007F, etc.) from link
hrefattributes at multiple layers to prevent hidden content injection viamailto:links.Bug: ADO #409639 - Rooster should strip or neutralize invisible Unicode when rendering drafts, especially in case of MailTo links
Problem
Bug explains the problem statement .. avoiding keeping it here for the sake of keeping internal details private.
Changes
Defense-in-depth - invisible Unicode stripped at 4 layers:
stripInvisibleUnicode.ts(new)sanitizeElement.tscheckXss.tsinsertLink()API + preventsscript:bypasslinkFormatHandler.tsCharacters stripped
Design decisions
roosterjs-content-model-domto avoid circular dependenciesTesting