Preserve HTML formatting in bookmarklet text selection capture#90
Preserve HTML formatting in bookmarklet text selection capture#90
Conversation
Co-authored-by: kraftbj <88897+kraftbj@users.noreply.github.com>
- Bookmarklet now captures selected HTML using getRangeAt()/cloneContents() and sends it as `sel_html` alongside the plain-text `s` field - Added htmlToBlocks() utility that converts HTML to Gutenberg block markup (handles headings h1-h6, paragraphs, ul/ol lists with nesting, blockquotes, and pre/code blocks; strips scripts/event handlers for safety) - buildSuggestedContent() now renders HTML selections as formatted blocks instead of plain-text in a quote block - App.js postMessage handler passes sel_html through to content builder - Added 20 new unit tests for htmlToBlocks and updated buildSuggestedContent - Added bookmarklet test for HTML selection capture - Rebuilt minified bookmarklet Co-authored-by: kraftbj <88897+kraftbj@users.noreply.github.com>
kraftbj
left a comment
There was a problem hiding this comment.
PR Review
Critical Issues (3)
-
Sanitizer uses denylist, not allowlist (
html-parser.js:459-486) — The JSDoc promises an allowlist of safe inline elements, but the implementation only denylists specific dangerous patterns. Everything else passes through:<form>,<svg>,<input>,<button>,<textarea>, plus all attributes likestyle,class,formaction, etc. -
data:andvbscript:URI schemes not stripped (html-parser.js:476-484) — Onlyjavascript:is blocked.data:text/html,...links execute scripts when clicked and survive sanitization into paragraph blocks. -
Silent content loss when
selectionHtmlproduces empty blocks (html-parser.js:711-715,App.js:228-229) — WhenselectionHtmlis truthy buthtmlToBlocks()returns empty (whitespace-only HTML, only script tags, etc.), no content block is produced. The description fallback is also defeated because App.js blanksdescriptionwhenselectionHtmlexists. The user's selection is silently discarded.
Important Issues (5)
-
Bookmarklet HTML capture has no try-catch (
bookmarklet.js:39-43) — IfcloneContents()throws (cross-shadow-DOM ranges, browser extensions), the entire bookmarklet halts with no popup and no feedback. -
Async event listener swallows errors (
App.js:189) —handleMessageis async but used as an event listener. Unhandled errors produce silent promise rejections. Consider wrapping the body in a top-level try-catch withconsole.error. (Pre-existing line, not in this diff, but this PR increases the risk by adding HTML parsing that can fail.) -
Unbounded recursion (
html-parser.js:517, 643) —listElementToBlockandhtmlToBlocks(blockquotes) recurse without depth limits. Crafted deeply nested HTML causes stack overflow. -
Entity encoding defeats plain-text optimization (
bookmarklet.js:311-314) —innerHTMLentity-encodes&etc. whiletoString()doesn't, soselectionHtml !== selectionis true even for unformatted text like "Tom & Jerry". -
buildSuggestedContentJSDoc doesn't documentselectionHtml— The new property changes function behavior significantly but isn't documented.
Strengths
- Core design is sound —
getRangeAt/cloneContentsto Gutenberg blocks is the right approach. - DOMParser is the correct choice for untrusted HTML parsing.
- Proper
escapeHtmlon text content within blocks. - Good test coverage for the happy path (all block types, nested lists, inline formatting, real-world example from the issue).
- Clean backward compatibility —
sel_htmlis additive, plain-textsstill sent.
Critical fixes: - Switch sanitizeInlineContent to allowlist (strong,em,b,i,u,s,a,code, mark,sub,sup,span,br); all other elements unwrapped, all attributes stripped except href on <a> - Block data: and vbscript: URI schemes alongside javascript: in href - Fix silent content loss: buildSuggestedContent falls back to description quote when selectionHtml produces empty blocks; App.js always computes description so the fallback is available Important fixes: - Wrap bookmarklet cloneContents() DOM calls in try-catch - Add depth limits: listElementToBlock cap at 10, htmlToBlocks blockquote recursion cap at 5 - Remove unreliable selectionHtml !== selection comparison (innerHTML entity-encodes & to &); always send sel_html when capture succeeded Also fix tempEl.firstChild.innerHTML → tempEl.innerHTML (heading/paragraph/ blockquote handlers were broken by the allowlist unwrapping block containers) Update JSDoc for buildSuggestedContent to document selectionHtml Add tests: data:/vbscript: blocking, non-allowlist unwrapping, attribute stripping, depth-limit safety, selectionHtml→empty→description fallback Co-authored-by: kraftbj <88897+kraftbj@users.noreply.github.com>
All critical and important issues addressed in 8d3ae6c — summary of changes: Critical (3/3 fixed):
Important (4/4 fixed):
Item 5 (async event listener) is pre-existing and unrelated to these changes; left as-is. |
Manual Testing StepsThe bookmarklet now captures HTML formatting from the user's selection and converts it to Gutenberg blocks. Here's what to test. Setup
Core Functionality: Formatted SelectionOn a page with rich content (ChatGPT output, a StackOverflow answer, a Wikipedia article), select a chunk that includes varied formatting, then click the bookmarklet.
Backward Compatibility
Security / Sanitization
Edge Cases
Error Resilience
|
…fix-formatting-issues
assets/bookmarklet.jsto capture selected HTML usinggetRangeAt()/cloneContents()/innerHTMLand send assel_htmlalongside plain-textsselectionHtml !== selectionguard; always sendsel_htmlwhen HTML capture succeededsanitizeInlineContent()with strict allowlist (strong, em, b, i, u, s, a[href], code, mark, sub, sup, span, br); unwraps all other elements; strips all attributes excepthrefon<a>data:andvbscript:URI schemes alongsidejavascript:in href checkslistElementToBlock()with depth limit (cap 10) to prevent stack overflowhtmlToBlocks(html)with blockquote recursion depth limit (cap 5)tempEl.firstChild.innerHTML→tempEl.innerHTML(heading/paragraph/blockquote handlers broke with allowlist sanitizer)buildSuggestedContent()to fall back todescriptionquote block whenselectionHtmlproduces empty blockssrc/App.jsto always computedescriptionfrom plain-text sources (let content builder decide priority)buildSuggestedContentto documentselectionHtmland fallback behaviorhtmlToBlocksfromsrc/utils/index.jsassets/bookmarklet.min.jsOriginal prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.