-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Description
Issue Summary
The Problem
Headings automatically include an id attribute with a lowercased and dashed 'slug'-ish version of the heading text. E.g., the second-level heading 'My Favorite Book' will be rendered as <h2 id="my-favorite-book">.
This transformation also strips out a number of non alphanumeric characters and encodes non-ASCII ones. E.g., this heading:
"What's my favorite book?" you ask? Why, 'Moby Dick' of course! (I smile/laugh.)
…is turned into this HTML:
<h2 id="whats-my-favorite-book-you-ask-why-moby-dick-of-course-i-smilelaugh">"What's my favorite book?" you ask? Why, 'Moby Dick' of course! (I smile/laugh.)</h2>
As you can see, the punctuation is stripped out: single and double quotation marks, question marks exclamation points, commas, parentheses, commas, periods.
But only some punctuation is stripped out. If I use curly/fancy/typographer's quotation marks or other punctuation or special characters, they are encoded instead of stripped. E.g., this heading:
“It’s me,” I said.
Is turned into this HTML:
<h2 id="%E2%80%9Cit%E2%80%99s-me%E2%80%9D-i-said">“It’s me,” I said.</h2>
Why It's a Problem
I see two problems here:
- The anchor URLs for linking to these headings are ugly and hard to read.
- The anchor URLs for these headings are not easy to guess, which means that editors who are trying to link to headings further down the page don't know what to put for URLs for internal links.
The Request
Would you consider stripping more characters from the heading id attribute?
In my testing, the following characters are removed from heading id attributes:
' " ; , . < > / \ ? ! [ ] ( ) { } @ # $ % ^ & * = _ + ~
But these characters are not removed:
‘ ’ “ ” ` ¡ ¿ - – — •
Related Tickets
This has been brought up before, in #13876 and #14179, but those tickets were closed because it is intentional that characters are encoded so that "when links or URLs are displayed by browsers they will appear as native characters."
I understand this goal, but I don't think punctuation should be preserved, and I think the characters listed above could safely be removed from these attribute values without causing problems or losing important information.
Steps to Reproduce
- In a post, make a new heading (e.g., a second-level heading).
- Use any of these special characters in a heading:
‘ ’ “ ” ` ¡ ¿ - – — •(e.g.,“It’s me—” I said) - Publish the post.
- In the published post, inspect the HTML for the heading.
- Note that the heading's
idattribute is full of encoded punctuation characters.
Ghost Version
5.109.2
Node.js Version
18.20.5
How did you install Ghost?
macOS Sequoia 15.3.1, ghost-cli, ghost install local
Database type
MySQL 5.7
Browser & OS version
n/a
Relevant log / error output
n/a
Code of Conduct
- I agree to be friendly and polite to people in this repository