Skip to content

Commit 7fca0c1

Browse files
cli-add-table-plugin (#146)
* cli: add --plugin-table * docs: add table plugin to readme * docs: add TagType and RendererFor to readme * cli: added more table flags & improved validation * converter: add error message prefix for plugin init
1 parent 3f762f5 commit 7fca0c1

File tree

19 files changed

+365
-58
lines changed

19 files changed

+365
-58
lines changed

.github/images/point_table.png

190 KB
Loading
52.1 KB
Loading

README.md

Lines changed: 55 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,9 @@ Here are some _cool features_:
3939

4040
![](./.github/images/point_strikethrough.png)
4141

42-
---
42+
- **Table Plugin:** Converts tables with support for alignment, rowspan and colspan.
43+
44+
![](./.github/images/point_table.png)
4345

4446
---
4547

@@ -145,6 +147,8 @@ func main() {
145147
commonmark.WithStrongDelimiter("__"),
146148
// ...additional configurations for the plugin
147149
),
150+
151+
// ...additional plugins (e.g. table)
148152
),
149153
)
150154

@@ -162,27 +166,55 @@ func main() {
162166
> [!NOTE]
163167
> If you use `NewConverter` directly make sure to also **register the commonmark and base plugin**.
164168
169+
---
170+
171+
### Collapse & Tag Type
172+
173+
![](./.github/images/tag_type_renderer.png)
174+
175+
You can specify how different HTML tags should be handled during conversion.
176+
177+
- **Tag Types:** When _collapsing_ whitespace it is useful to know if a node is _block_ or _inline_.
178+
- So if you have Web Components/Custom Elements remember to register the type using `TagType` or `RendererFor`.
179+
- Additionally, you can _remove_ tags completely from the output.
180+
- **Pre-built Renderers:** There are several pre-built renderers available. For example:
181+
- `RenderAsHTML` will render the node (including children) as HTML.
182+
- `RenderAsHTMLWrapper` will render the node as HTML and render the children as markdown.
183+
184+
> [!NOTE]
185+
> By default, some tags are automatically removed (e.g. `<style>`). You can override existing configuration by using a different _priority_. For example, you could keep `<style>` tags by registering them with `PriorityEarly`.
186+
187+
Here are the examples for the screenshot above:
188+
189+
```go
190+
conv.Register.TagType("nav", converter.TagTypeRemove, converter.PriorityStandard)
191+
192+
conv.Register.RendererFor("b", converter.TagTypeInline, base.RenderAsHTML, converter.PriorityEarly)
193+
194+
conv.Register.RendererFor("article", converter.TagTypeBlock, base.RenderAsHTMLWrapper, converter.PriorityStandard)
195+
```
196+
165197
### Plugins
166198

167199
#### Published Plugins
168200

169201
These are the plugins located in the [plugin folder](/plugin):
170202

171-
| Name | Description |
172-
| --------------------- | ------------------------------------------------------------------------------------ |
173-
| Base | Implements basic shared functionality (e.g. removing nodes) |
174-
| Commonmark | Implements Markdown according to the [Commonmark Spec](https://spec.commonmark.org/) |
175-
| | |
176-
| GitHubFlavored | _planned_ |
177-
| TaskListItems | _planned_ |
178-
| Strikethrough | Converts `<strike>`, `<s>`, and `<del>` to the `~~` syntax. |
179-
| Table | _planned_ |
180-
| | |
181-
| VimeoEmbed | _planned_ |
182-
| YoutubeEmbed | _planned_ |
183-
| | |
184-
| ConfluenceCodeBlock | _planned_ |
185-
| ConfluenceAttachments | _planned_ |
203+
| Name | Description |
204+
| --------------------- | -------------------------------------------------------------------------------------------------- |
205+
| Base | Implements basic shared functionality (e.g. removing nodes) |
206+
| Commonmark | Implements Markdown according to the [Commonmark Spec](https://spec.commonmark.org/) |
207+
| | |
208+
| GitHubFlavored | _planned_ |
209+
| TaskListItems | _planned_ |
210+
| Strikethrough | Converts `<strike>`, `<s>`, and `<del>` to the `~~` syntax. |
211+
| Table | Implements Tables according to the [GitHub Flavored Markdown Spec](https://github.github.com/gfm/) |
212+
| | |
213+
| VimeoEmbed | _planned_ |
214+
| YoutubeEmbed | _planned_ |
215+
| | |
216+
| ConfluenceCodeBlock | _planned_ |
217+
| ConfluenceAttachments | _planned_ |
186218

187219
> [!NOTE]
188220
> Not all the plugins from v1 are already ported to v2. These will soon be implemented...
@@ -280,11 +312,18 @@ This domain is for use in illustrative examples in documents. You may use this d
280312
[More information...](https://www.iana.org/domains/example)
281313
```
282314

315+
```bash
316+
$ html2markdown --input file.html --output file.md
317+
318+
$ html2markdown --input "src/*.html" --output "dist/"
319+
```
320+
283321
Use `--help` to learn about the configurations, for example:
284322

285323
- `--domain="https://example.com"` to convert _relative_ links to _absolute_ links.
286324
- `--exclude-selector=".ad"` to exclude the html elements with `class="ad"` from the conversion.
287325
- `--include-selector="article"` to only include the `<article>` html elements in the conversion.
326+
- `--plugin-strikethrough` or `--plugin-table` to enable plugins.
288327

289328
_(The cli does not support every option yet. Over time more customization will be added)_
290329

cli/html2markdown/cmd/cmd_convert.go

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@ package cmd
22

33
import (
44
"bytes"
5+
"errors"
56
"fmt"
67

78
"github.com/JohannesKaufmann/dom"
89
"github.com/JohannesKaufmann/html-to-markdown/v2/converter"
910
"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/base"
1011
"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/commonmark"
1112
"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/strikethrough"
13+
"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/table"
1214
"github.com/andybalholm/cascadia"
1315
"golang.org/x/net/html"
1416
)
@@ -99,8 +101,18 @@ func (cli *CLI) convert(input []byte) ([]byte, error) {
99101
),
100102
)
101103
if cli.config.enablePluginStrikethrough {
102-
// TODO: while this works, this does not add the `Name` to the internal list
103-
strikethrough.NewStrikethroughPlugin().Init(conv)
104+
conv.Register.Plugin(strikethrough.NewStrikethroughPlugin())
105+
}
106+
107+
if cli.config.enablePluginTable {
108+
conv.Register.Plugin(
109+
table.NewTablePlugin(
110+
table.WithSkipEmptyRows(cli.config.tableSkipEmptyRows),
111+
table.WithHeaderPromotion(cli.config.tableHeaderPromotion),
112+
table.WithSpanCellBehavior(table.SpanCellBehavior(cli.config.tableSpanCellBehavior)),
113+
table.WithPresentationTables(cli.config.tablePresentationTables),
114+
),
115+
)
104116
}
105117

106118
doc, err := cli.parseInputWithSelectors(input)
@@ -110,9 +122,10 @@ func (cli *CLI) convert(input []byte) ([]byte, error) {
110122

111123
markdown, err := conv.ConvertNode(doc, converter.WithDomain(cli.config.domain))
112124
if err != nil {
113-
e, ok := err.(*commonmark.ValidateConfigError)
114-
if ok {
115-
return nil, overrideValidationError(e)
125+
126+
var validationErr *commonmark.ValidateConfigError
127+
if errors.As(err, &validationErr) {
128+
return nil, overrideValidationError(validationErr)
116129
}
117130

118131
return nil, err

cli/html2markdown/cmd/exec.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,12 @@ type Config struct {
3939

4040
// - - - - - Plugins - - - - - //
4141
enablePluginStrikethrough bool
42+
43+
enablePluginTable bool
44+
tableSkipEmptyRows bool
45+
tableHeaderPromotion bool
46+
tableSpanCellBehavior string
47+
tablePresentationTables bool
4248
}
4349

4450
// Release holds the information (from the 3 ldflags) that goreleaser sets.

cli/html2markdown/cmd/exec_test.go

Lines changed: 117 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -422,6 +422,32 @@ func TestExecute(t *testing.T) {
422422
},
423423
},
424424

425+
// - - - - - validation of options (plugin) - - - - - //
426+
{
427+
desc: "[validation] option requires plugin",
428+
429+
input: CLIGoldenInput{
430+
modeStdin: modePipe,
431+
modeStdout: modePipe,
432+
modeStderr: modePipe,
433+
434+
inputStdin: []byte("<strong>text</strong>"),
435+
inputArgs: []string{"html2markdown", `--opt-table-skip-empty-rows`},
436+
},
437+
},
438+
{
439+
desc: "[validation] plugin option invalid value",
440+
441+
input: CLIGoldenInput{
442+
modeStdin: modePipe,
443+
modeStdout: modePipe,
444+
modeStderr: modePipe,
445+
446+
inputStdin: []byte("<strong>text</strong>"),
447+
inputArgs: []string{"html2markdown", `--plugin-table`, `--opt-table-span-cell-behavior=random`},
448+
},
449+
},
450+
425451
// - - - - - files (--input and --output) - - - - - //
426452
{
427453
desc: "[files] without suffix existing dir",
@@ -591,7 +617,7 @@ func TestExecute_General(t *testing.T) {
591617

592618
func TestExecute_Plugins(t *testing.T) {
593619
testCases := []CLITestCase{
594-
620+
// - - - - - plugin: strikethrough - - - - - //
595621
{
596622
desc: "[plugin-strikethrough] disabled by default",
597623

@@ -608,6 +634,96 @@ func TestExecute_Plugins(t *testing.T) {
608634

609635
expectedStdout: []byte("Some ~~outdated~~ text\n"),
610636
},
637+
638+
// - - - - - plugin: table - - - - - //
639+
{
640+
desc: "[plugin-table] disabled by default",
641+
642+
inputStdin: []byte(`
643+
<table>
644+
<tr>
645+
<td>A1</td>
646+
<td>A2</td>
647+
</tr>
648+
<tr>
649+
<td>B1</td>
650+
<td>B2</td>
651+
</tr>
652+
</table>
653+
`),
654+
inputArgs: []string{"html2markdown"},
655+
656+
expectedStdout: []byte("A1 A2 B1 B2\n"),
657+
},
658+
{
659+
desc: "[plugin-table] enabled",
660+
661+
inputStdin: []byte(`
662+
<table>
663+
<tr>
664+
<td>A1</td>
665+
<td>A2</td>
666+
</tr>
667+
<tr>
668+
<td></td>
669+
<td></td>
670+
</tr>
671+
<tr>
672+
<td>C1</td>
673+
<td>C2</td>
674+
</tr>
675+
</table>
676+
`),
677+
inputArgs: []string{"html2markdown", "--plugin-table"},
678+
679+
expectedStdout: []byte("| | |\n|----|----|\n| A1 | A2 |\n| | |\n| C1 | C2 |\n"),
680+
},
681+
{
682+
desc: "[plugin-table] skip empty rows",
683+
684+
inputStdin: []byte(`
685+
<table>
686+
<tr>
687+
<td>A1</td>
688+
<td>A2</td>
689+
</tr>
690+
<tr>
691+
<td></td>
692+
<td></td>
693+
</tr>
694+
<tr>
695+
<td>C1</td>
696+
<td>C2</td>
697+
</tr>
698+
</table>
699+
`),
700+
inputArgs: []string{"html2markdown", "--plugin-table", "--opt-table-skip-empty-rows"},
701+
702+
expectedStdout: []byte("| | |\n|----|----|\n| A1 | A2 |\n| C1 | C2 |\n"),
703+
},
704+
{
705+
desc: "[plugin-table] skip empty rows & header promotion",
706+
707+
inputStdin: []byte(`
708+
<table>
709+
<tr>
710+
<td>A1</td>
711+
<td>A2</td>
712+
</tr>
713+
<tr>
714+
<td></td>
715+
<td></td>
716+
</tr>
717+
<tr>
718+
<td>C1</td>
719+
<td>C2</td>
720+
</tr>
721+
</table>
722+
`),
723+
inputArgs: []string{"html2markdown", "--plugin-table", "--opt-table-skip-empty-rows", "--opt-table-header-promotion"},
724+
725+
expectedStdout: []byte("| A1 | A2 |\n|----|----|\n| C1 | C2 |\n"),
726+
},
611727
}
612728
for _, tC := range testCases {
613729
t.Run(tC.desc, func(t *testing.T) {

cli/html2markdown/cmd/flags.go

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,11 @@ func (cli *CLI) initFlags(progname string) {
8888
// TODO: --opt-strikethrough-delimiter for the strikethrough plugin
8989
cli.flags.BoolVar(&cli.config.enablePluginStrikethrough, "plugin-strikethrough", false, "enable the plugin ~~strikethrough~~")
9090

91+
cli.flags.BoolVar(&cli.config.enablePluginTable, "plugin-table", false, "enable the plugin table")
92+
cli.flags.BoolVar(&cli.config.tableSkipEmptyRows, "opt-table-skip-empty-rows", false, "[for --plugin-table] omit empty rows from the output")
93+
cli.flags.BoolVar(&cli.config.tableHeaderPromotion, "opt-table-header-promotion", false, "[for --plugin-table] first row should be treated as a header")
94+
cli.flags.StringVar(&cli.config.tableSpanCellBehavior, "opt-table-span-cell-behavior", "", `[for --plugin-table] how colspan/rowspan should be rendered: "empty" or "mirror"`)
95+
cli.flags.BoolVar(&cli.config.tablePresentationTables, "opt-table-presentation-tables", false, `[for --plugin-table] whether tables with role="presentation" should be converted`)
9196
}
9297

9398
func (cli *CLI) parseFlags(args []string) error {
@@ -98,5 +103,22 @@ func (cli *CLI) parseFlags(args []string) error {
98103

99104
cli.config.args = cli.flags.Args()
100105

106+
// Validate flag dependencies
107+
if cli.config.tableSkipEmptyRows && !cli.config.enablePluginTable {
108+
return fmt.Errorf("--opt-table-skip-empty-rows requires --plugin-table to be enabled")
109+
}
110+
if cli.config.tableHeaderPromotion && !cli.config.enablePluginTable {
111+
return fmt.Errorf("--opt-table-header-promotion requires --plugin-table to be enabled")
112+
}
113+
if cli.config.tableSpanCellBehavior != "" && !cli.config.enablePluginTable {
114+
return fmt.Errorf("--opt-table-span-cell-behavior requires --plugin-table to be enabled")
115+
}
116+
if cli.config.tablePresentationTables && !cli.config.enablePluginTable {
117+
return fmt.Errorf("--opt-table-presentation-tables requires --plugin-table to be enabled")
118+
}
119+
120+
// TODO: use constant for flag name & use formatFlag
121+
// var keyStrongDelimiter = "opt-strong-delimiter"
122+
101123
return nil
102124
}

cli/html2markdown/cmd/testdata/TestExecute/[general]_help_pipe/stdout.golden

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,24 @@ Use a HTML sanitizer before displaying the HTML in the browser!
7272
Make bold text. Should <strong> be indicated by two asterisks or two underscores?
7373
"**" or "__" (default: "**")
7474

75+
--opt-table-header-promotion
76+
[for --plugin-table] first row should be treated as a header
77+
78+
--opt-table-presentation-tables
79+
[for --plugin-table] whether tables with role="presentation" should be converted
80+
81+
--opt-table-skip-empty-rows
82+
[for --plugin-table] omit empty rows from the output
83+
84+
--opt-table-span-cell-behavior
85+
[for --plugin-table] how colspan/rowspan should be rendered: "empty" or "mirror"
86+
7587
--plugin-strikethrough
7688
enable the plugin ~~strikethrough~~
7789

90+
--plugin-table
91+
enable the plugin table
92+
7893

7994

8095
For more information visit the documentation:

cli/html2markdown/cmd/testdata/TestExecute/[general]_help_terminal/stdout.golden

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,24 @@ Use a HTML sanitizer before displaying the HTML in the browser!
7272
Make bold text. Should <strong> be indicated by two asterisks or two underscores?
7373
"**" or "__" (default: "**")
7474

75+
--opt-table-header-promotion
76+
[for --plugin-table] first row should be treated as a header
77+
78+
--opt-table-presentation-tables
79+
[for --plugin-table] whether tables with role="presentation" should be converted
80+
81+
--opt-table-skip-empty-rows
82+
[for --plugin-table] omit empty rows from the output
83+
84+
--opt-table-span-cell-behavior
85+
[for --plugin-table] how colspan/rowspan should be rendered: "empty" or "mirror"
86+
7587
--plugin-strikethrough
7688
enable the plugin ~~strikethrough~~
7789

90+
--plugin-table
91+
enable the plugin table
92+
7893

7994

8095
For more information visit the documentation:

0 commit comments

Comments
 (0)