-
Notifications
You must be signed in to change notification settings - Fork 176
Documents separator groups for recursive chunking strategy #3848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔍 Preview links for changed docs |
dan-rubinstein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this documentation. This will really help make it clear what options are available to the user.
benironside
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Just a few minor suggestions but nothing blocking.
|
|
||
| ##### Markdown separator group | ||
| You can configure the `recursive` strategy using either: | ||
| - [Predefined separator groups](#separator-groups): [`plaintext`](#plaintext) or [`markdown`](#markdown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - [Predefined separator groups](#separator-groups): [`plaintext`](#plaintext) or [`markdown`](#markdown) | |
| - [Predefined separator groups](#separator-groups): [`Plaintext`](#plaintext) or [`markdown`](#markdown) |
Maybe capitalize to match the following line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or else make "Custom separators" lower case?
| The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk. | ||
| ##### Predefined separator groups [separator-groups] | ||
|
|
||
| Predefined separator groups provide optimized patterns for common text formats: [`plaintext`](#plaintext) for simple line-structured text without markup, and [`markdown`](#markdown) for Markdown-formatted content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Predefined separator groups provide optimized patterns for common text formats: [`plaintext`](#plaintext) for simple line-structured text without markup, and [`markdown`](#markdown) for Markdown-formatted content. | |
| Predefined separator groups provide optimized patterns for common text formats: [`plaintext`](#plaintext) works for simple line-structured text without markup, and [`markdown`](#markdown) works for Markdown-formatted content. |
or maybe "is for"
This PR adds documentation for the
plaintextandmarkdownseparator group options available in therecursivechunking strategy.Changes
plaintextandmarkdown)Related issue: #3015