Skip to content

Conversation

@kosabogi
Copy link
Contributor

@kosabogi kosabogi commented Nov 7, 2025

This PR adds documentation for the plaintext and markdown separator group options available in the recursive chunking strategy.

Changes

  • Added documentation section for predefined separator groups (plaintext and markdown)
  • Added regex pattern details into collapsible dropdowns
  • Refined wording and structure for clarity

Related issue: #3015

@kosabogi kosabogi requested review from a team as code owners November 7, 2025 11:25
@kosabogi kosabogi added enhancement New feature or request Team:Developer Issues owned by the Developer Docs Team labels Nov 7, 2025
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

🔍 Preview links for changed docs

Copy link
Member

@dan-rubinstein dan-rubinstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this documentation. This will really help make it clear what options are available to the user.

Copy link
Contributor

@benironside benironside left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Just a few minor suggestions but nothing blocking.


##### Markdown separator group
You can configure the `recursive` strategy using either:
- [Predefined separator groups](#separator-groups): [`plaintext`](#plaintext) or [`markdown`](#markdown)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Predefined separator groups](#separator-groups): [`plaintext`](#plaintext) or [`markdown`](#markdown)
- [Predefined separator groups](#separator-groups): [`Plaintext`](#plaintext) or [`markdown`](#markdown)

Maybe capitalize to match the following line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or else make "Custom separators" lower case?

The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
##### Predefined separator groups [separator-groups]

Predefined separator groups provide optimized patterns for common text formats: [`plaintext`](#plaintext) for simple line-structured text without markup, and [`markdown`](#markdown) for Markdown-formatted content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Predefined separator groups provide optimized patterns for common text formats: [`plaintext`](#plaintext) for simple line-structured text without markup, and [`markdown`](#markdown) for Markdown-formatted content.
Predefined separator groups provide optimized patterns for common text formats: [`plaintext`](#plaintext) works for simple line-structured text without markup, and [`markdown`](#markdown) works for Markdown-formatted content.

or maybe "is for"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Team:Developer Issues owned by the Developer Docs Team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants