Skip to content

Commit 2afac08

Browse files
committed
Add improved docs
1 parent b13a52d commit 2afac08

File tree

1 file changed

+153
-80
lines changed

1 file changed

+153
-80
lines changed

readme.md

Lines changed: 153 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
[![Backers][backers-badge]][collective]
99
[![Chat][chat-badge]][chat]
1010

11-
[micromark][] extension to support GFM [literal autolinks][spec].
11+
[micromark][] extensions to support GFM [literal autolinks][spec].
1212

1313
## Contents
1414

@@ -19,6 +19,7 @@
1919
* [API](#api)
2020
* [`gfmAutolinkLiteral`](#gfmautolinkliteral)
2121
* [`gfmAutolinkLiteralHtml`](#gfmautolinkliteralhtml)
22+
* [Bugs](#bugs)
2223
* [Authoring](#authoring)
2324
* [HTML](#html)
2425
* [CSS](#css)
@@ -32,7 +33,7 @@
3233

3334
## What is this?
3435

35-
This package contains extensions that add support for the autolink syntax
36+
This package contains extensions that add support for the extra autolink syntax
3637
enabled by GFM to [`micromark`][micromark].
3738

3839
GitHub employs different algorithms to autolink: one at parse time and one at
@@ -43,27 +44,33 @@ But also because issues/PRs/comments omit (perhaps by accident?) the second
4344
algorithm for `www.`, `http://`, and `https://` links (but not for email links).
4445

4546
As this is a syntax extension, it focuses on the first algorithm.
46-
The second algorithm is performed by [`mdast-util-gfm-autolink-literal`][util].
47+
The second algorithm is performed by
48+
[`mdast-util-gfm-autolink-literal`][mdast-util-gfm-autolink-literal].
4749
The `html` part of this micromark extension does not operate on an AST and hence
4850
can’t perform the second algorithm.
4951

52+
The implementation of autolink literal on github.com is currently buggy.
53+
The bugs have been reported on [`cmark-gfm`][cmark-gfm].
54+
This micromark extension matches github.com except for its bugs.
55+
5056
## When to use this
5157

52-
These tools are all low-level.
53-
In many cases, you want to use [`remark-gfm`][plugin] with remark instead.
58+
This project is useful when you want to support autolink literals in markdown.
59+
60+
You can use these extensions when you are working with [`micromark`][micromark].
61+
To support all GFM features, use
62+
[`micromark-extension-gfm`][micromark-extension-gfm] instead.
5463

55-
Even when you want to use `micromark`, you likely want to use
56-
[`micromark-extension-gfm`][micromark-extension-gfm] to support all GFM
57-
features.
58-
That extension includes this extension.
64+
When you need a syntax tree, combine this package with
65+
[`mdast-util-gfm-autolink-literal`][mdast-util-gfm-autolink-literal].
5966

60-
When working with `mdast-util-from-markdown`, you must combine this package with
61-
[`mdast-util-gfm-autolink-literal`][util].
67+
All these packages are used in [`remark-gfm`][remark-gfm], which focusses on
68+
making it easier to transform content by abstracting these internals away.
6269

6370
## Install
6471

6572
This package is [ESM only][esm].
66-
In Node.js (version 12.20+, 14.14+, 16.0+, or 18.0+), install with [npm][]:
73+
In Node.js (version 16+), install with [npm][]:
6774

6875
```sh
6976
npm install micromark-extension-gfm-autolink-literal
@@ -108,47 +115,56 @@ Yields:
108115

109116
## API
110117

111-
This package exports the identifiers `gfmAutolinkLiteral` and
112-
`gfmAutolinkLiteralHtml`.
118+
This package exports the identifiers
119+
[`gfmAutolinkLiteral`][api-gfm-autolink-literal] and
120+
[`gfmAutolinkLiteralHtml`][api-gfm-autolink-literal-html].
113121
There is no default export.
114122

115-
The export map supports the endorsed [`development` condition][condition].
123+
The export map supports the [`development` condition][development].
116124
Run `node --conditions development module.js` to get instrumented dev code.
117125
Without this condition, production code is loaded.
118126

119127
### `gfmAutolinkLiteral`
120128

121-
Syntax extension for micromark (passed in `extensions`).
129+
Extension for `micromark` that can be passed in `extensions` to enable GFM
130+
autolink literal syntax ([`Extension`][micromark-extension]).
122131

123132
### `gfmAutolinkLiteralHtml`
124133

125-
HTML extension for micromark (can be passed in `htmlExtensions`).
134+
Extension for `micromark` that can be passed in `htmlExtensions` to support
135+
GFM autolink literals when serializing to HTML
136+
([`HtmlExtension`][micromark-html-extension]).
126137

127-
## Authoring
138+
## Bugs
128139

129-
When authoring markdown, it’s recommended *not* to use this construct.
130-
It is fragile (easy to get wrong) and not pretty to readers (it’s presented as
131-
just a URL, there is no descriptive text).
132-
Instead, use link (resource) or link (label):
140+
GitHub’s own algorithm to parse autolink literals contains three bugs.
141+
A smaller bug is left unfixed in this project for consistency.
142+
Two main bugs are not present in this project.
143+
The issues relating to autolink literals are:
133144

134-
```markdown
135-
Instead of https://example.com (worst), use <https://example.com> (better),
136-
or [link (resource)](https://example.com) or [link (reference)][ref] (best).
145+
* [GFM autolink extension (`www.`, `https?://` parts): links don’t work when
146+
after bracket](https://github.com/github/cmark-gfm/issues/278)\
147+
fixed here ✅
148+
* [GFM autolink extension (`www.` part): uppercase does not match on
149+
issues/PRs/comments](https://github.com/github/cmark-gfm/issues/280)\
150+
fixed here ✅
151+
* [GFM autolink extension (`www.` part): the word `www`
152+
matches](https://github.com/github/cmark-gfm/issues/279)\
153+
present here for consistency
137154

138-
[ref]: https://example.com
139-
```
155+
## Authoring
140156

141-
When authoring markdown where the source does not matter (such as comments to
142-
some page), it can be useful to quickly paste URLs, and this will mostly work.
157+
It is recommended to use labels, either with a resource or a definition,
158+
instead of autolink literals, as those allow relative URLs and descriptive
159+
text to explain the URL in prose.
143160

144161
## HTML
145162

146-
GFM autolink literals, similar to normal CommonMark autolinks (such as
147-
`<https://example.com>`), relate to the `<a>` element in HTML.
148-
See [*§ 4.5.1 The `a` element*][html] in the HTML spec for more info.
149-
When an email autolink is used, the string `mailto:` is prepended before the
150-
email, when generating the `href` attribute of the hyperlink.
151-
When a `www` autolink is used, the string `http://` is prepended.
163+
GFM autolink literals relate to the `<a>` element in HTML.
164+
See [*§ 4.5.1 The `a` element*][html-a] in the HTML spec for more info.
165+
When an email autolink is used, the string `mailto:` is prepended when
166+
generating the `href` attribute of the hyperlink.
167+
When a www autolink is used, the string `http://` is prepended.
152168

153169
## CSS
154170

@@ -183,64 +199,107 @@ a:not([href]) {
183199

184200
## Syntax
185201

186-
Autolink literals are very complex to parse.
187-
They form with, roughly, the following BNF:
202+
Autolink literals form with, roughly, the following BNF:
188203

189204
```bnf
190-
; Restriction: not allowed to be in unbalanced braces.
191-
autolink ::= www-autolink | http-autolink | email-autolink
205+
gfm_autolink_literal ::= gfm_protocol_autolink | gfm_www_autolink | gfm_email_autolink
192206
193-
; Restriction: the code before must be `www-autolink-before`.
194-
www-autolink ::= 3( "w" | "W" ) "." [ domain [ path ] ]
195-
www-autolink-before ::= eof | eol | space-or-tab | "(" | "*" | "_" | "~"
207+
; Restriction: the code before must be `www_autolink_before`.
208+
; Restriction: the code after `.` must not be eof.
209+
www_autolink ::= 3('w' | 'W') '.' [domain [path]]
210+
www_autolink_before ::= eof | eol | space_or_tab | '(' | '*' | '_' | '[' | ']' | '~'
196211
197-
; Restriction: the code before must be `http-autolink-before`.
198-
; Restriction: the code after the protocol must be `http-autolink-protocol-after`.
199-
http-autolink ::= ( "h" | "H" ) 2( "t" | "T" ) ( "p" | "P" ) [ "s" | "S" ] ":" 2"/" domain [ path ]
200-
http-autolink-before ::= code - ascii-alpha
201-
http-autolink-protocol-after ::= code - eof - eol - ascii-control - unicode-whitespace - unicode-punctuation
212+
; Restriction: the code before must be `http_autolink_before`.
213+
; Restriction: the code after the protocol must be `http_autolink_protocol_after`.
214+
http_autolink ::= ('h' | 'H') 2('t' | 'T') ('p' | 'P') ['s' | 'S'] ':' 2'/' domain [path]
215+
http_autolink_before ::= byte - ascii_alpha
216+
http_autolink_protocol_after ::= byte - eof - eol - ascii_control - unicode_whitespace - ode_punctuation
202217
203-
; Restriction: the code before must be `email-autolink-before`.
204-
; Restriction: `ascii-digit` may not occur in the last label part of the label.
205-
email-autolink ::= 1*( "+" | "-" | "." | "_" | ascii-alphanumeric ) "@" 1*( 1*label-segment label-dot-cont ) 1*label-segment
206-
email-autolink-before ::= code - ascii-alpha - "/"
218+
; Restriction: the code before must be `email_autolink_before`.
219+
; Restriction: `ascii_digit` may not occur in the last label part of the label.
220+
email_autolink ::= 1*('+' | '-' | '.' | '_' | ascii_alphanumeric) '@' 1*(1*label_segment l_dot_cont) 1*label_segment
221+
email_autolink_before ::= byte - ascii_alpha - '/'
207222
208223
; Restriction: `_` may not occur in the last two domain parts.
209-
domain ::= 1*( url-ampt-cont | domain-punct-cont | "-" | code - eof - ascii-control - unicode-whitespace - unicode-punctuation )
224+
domain ::= 1*(url_ampt_cont | domain_punct_cont | '-' | byte - eof - ascii_control - ode_whitespace - unicode_punctuation)
210225
; Restriction: must not be followed by `punct`.
211-
domain-punct-cont ::= "." | "_"
226+
domain_punct_cont ::= '.' | '_'
212227
; Restriction: must not be followed by `char-ref`.
213-
url-ampt-cont ::= "&"
228+
url_ampt_cont ::= '&'
214229
215230
; Restriction: a counter `balance = 0` is increased for every `(`, and decreased for every `)`.
216-
; Restriction: `)` must not be `paren-at-end`.
217-
path ::= 1*( url-ampt-cont | path-punctuation-cont | "(" | ")" | code - eof - eol - space-or-tab )
231+
; Restriction: `)` must not be `paren_at_end`.
232+
path ::= 1*(url_ampt_cont | path_punctuation_cont | '(' | ')' | byte - eof - eol - space_or_tab)
218233
; Restriction: must not be followed by `punct`.
219-
path-punctuation-cont ::= trailing-punctuation - "<"
234+
path_punctuation_cont ::= trailing_punctuation - '<'
220235
; Restriction: must be followed by `punct` and `balance` must be less than `0`.
221-
paren-at-end ::= ")"
236+
paren_at_end ::= ')'
222237
223-
label-segment ::= label-dash-underscore-cont | ascii-alpha | ascii-digit
238+
label_segment ::= label_dash_underscore_cont | ascii_alpha | ascii_digit
224239
; Restriction: if followed by `punct`, the whole email autolink is invalid.
225-
label-dash-underscore-cont ::= "-" | "_"
240+
label_dash_underscore_cont ::= '-' | '_'
226241
; Restriction: must not be followed by `punct`.
227-
label-dot-cont ::= "."
242+
label_dot_cont ::= '.'
243+
244+
punct ::= *trailing_punctuation ( byte - eof - eol - space_or_tab - '<' )
245+
char_ref ::= *ascii_alpha ';' path_end
246+
trailing_punctuation ::= '!' | '"' | '\'' | ')' | '*' | ',' | '.' | ':' | ';' | '<' | '?' | '_' | '~'
247+
```
248+
249+
The grammar for GFM autolink literal is very relaxed: basically anything
250+
except for whitespace is allowed after a prefix.
251+
To use whitespace characters and otherwise impossible characters, in URLs,
252+
you can use percent encoding:
253+
254+
```markdown
255+
https://example.com/alpha%20bravo
256+
```
257+
258+
Yields:
259+
260+
```html
261+
<p><a href="https://example.com/alpha%20bravo">https://example.com/alpha%20bravo</a></p>
262+
```
263+
264+
There are several cases where incorrect encoding of URLs would, in other
265+
languages, result in a parse error.
266+
In markdown, there are no errors, and URLs are normalized.
267+
In addition, many characters are percent encoded
268+
([`sanitizeUri`][micromark-util-sanitize-uri]).
269+
For example:
270+
271+
```markdown
272+
www.a👍b%
273+
```
228274

229-
punct ::= *trailing-punctuation ( code - eof - eol - space-or-tab - "<" )
230-
char-ref ::= *ascii-alpha ";" path-end
231-
trailing-punctuation ::= "!" | "\"" | "'" | ")" | "*" | "," | "." | ":" | ";" | "<" | '?' | '_' | '~'
275+
Yields:
276+
277+
```html
278+
<p><a href="http://www.a%F0%9F%91%8Db%25">www.a👍b%</a></p>
232279
```
233280

281+
There is a big difference between how www and protocol literals work
282+
compared to how email literals work.
283+
The first two are done when parsing, and work like anything else in
284+
markdown.
285+
But email literals are handled afterwards: when everything is parsed, we
286+
look back at the events to figure out if there were email addresses.
287+
This particularly affects how they interleave with character escapes and
288+
character references.
289+
234290
## Types
235291

236292
This package is fully typed with [TypeScript][].
237293
It exports no additional types.
238294

239295
## Compatibility
240296

241-
This package is at least compatible with all maintained versions of Node.js.
242-
As of now, that is Node.js 12.20+, 14.14+, 16.0+, and 18.0+.
243-
It also works in Deno and modern browsers.
297+
Projects maintained by the unified collective are compatible with all maintained
298+
versions of Node.js.
299+
As of now, that is Node.js 16+.
300+
Our projects sometimes work with older versions, but this is not guaranteed.
301+
302+
These extensions work with `micromark` version 3+.
244303

245304
## Security
246305

@@ -250,12 +309,14 @@ construct always produces safe links.
250309

251310
## Related
252311

253-
* [`syntax-tree/mdast-util-gfm-autolink-literal`][util]
254-
— support GFM autolink literals in mdast
255-
* [`syntax-tree/mdast-util-gfm`][mdast-util-gfm]
256-
— support GFM in mdast
257-
* [`remarkjs/remark-gfm`][plugin]
258-
— support GFM in remark
312+
* [`micromark-extension-gfm`][micromark-extension-gfm]
313+
— support all of GFM
314+
* [`mdast-util-gfm-autolink-literal`][mdast-util-gfm-autolink-literal]
315+
— support all of GFM in mdast
316+
* [`mdast-util-gfm`][mdast-util-gfm]
317+
— support all of GFM in mdast
318+
* [`remark-gfm`][remark-gfm]
319+
— support all of GFM in remark
259320

260321
## Contribute
261322

@@ -317,20 +378,32 @@ abide by its terms.
317378

318379
[typescript]: https://www.typescriptlang.org
319380

320-
[condition]: https://nodejs.org/api/packages.html#packages_resolving_user_conditions
381+
[development]: https://nodejs.org/api/packages.html#packages_resolving_user_conditions
321382

322-
[util]: https://github.com/syntax-tree/mdast-util-gfm-autolink-literal
383+
[micromark]: https://github.com/micromark/micromark
384+
385+
[micromark-extension-gfm]: https://github.com/micromark/micromark-extension-gfm
323386

324-
[plugin]: https://github.com/remarkjs/remark-gfm
387+
[micromark-util-sanitize-uri]: https://github.com/micromark/micromark/tree/main/packages/micromark-util-sanitize-uri
388+
389+
[micromark-extension]: https://github.com/micromark/micromark#syntaxextension
390+
391+
[micromark-html-extension]: https://github.com/micromark/micromark#htmlextension
392+
393+
[mdast-util-gfm]: https://github.com/syntax-tree/mdast-util-gfm
394+
395+
[mdast-util-gfm-autolink-literal]: https://github.com/syntax-tree/mdast-util-gfm-autolink-literal
396+
397+
[remark-gfm]: https://github.com/remarkjs/remark-gfm
325398

326399
[spec]: https://github.github.com/gfm/#autolinks-extension-
327400

328-
[html]: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element
401+
[html-a]: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element
329402

330403
[css]: https://github.com/sindresorhus/github-markdown-css
331404

332-
[micromark]: https://github.com/micromark/micromark
405+
[cmark-gfm]: https://github.com/github/cmark-gfm
333406

334-
[micromark-extension-gfm]: https://github.com/micromark/micromark-extension-gfm
407+
[api-gfm-autolink-literal]: #gfmautolinkliteral
335408

336-
[mdast-util-gfm]: https://github.com/syntax-tree/mdast-util-gfm
409+
[api-gfm-autolink-literal-html]: #gfmautolinkliteralhtml

0 commit comments

Comments
 (0)