- 
                Notifications
    
You must be signed in to change notification settings  - Fork 177
 
Description
Is your feature request related to a problem?
Currently, OpenSearch SQL does not provide a native way to expand multivalue fields into separate rows/events, similar to Splunk's mvexpand command. This limits the ability to efficiently work with and analyze multivalue fields directly within queries, which is a common use case in log analytics and data exploration.
What solution would you like?
Introduce an mvexpand command (or equivalent SQL function/operator) in OpenSearch SQL that:
- Expands a multivalue field into separate rows, one per value in the field, for each input event/document.
 - Supports an optional 
limitparameter to restrict how many values per event/document are expanded (similar tomvexpand <field> [limit=<int>]in Splunk). - Handles memory/resource limits similar to Splunk's RAM usage (with configurable limits and clear error messaging if limits are exceeded).
 - Works as a streaming/distributable command for performance at scale.
 
What alternatives have you considered?
- Manual preprocessing of documents to flatten multivalue fields before indexing, which is inflexible and not always possible.
 - Using scripting or custom plugins, which increases complexity for users and maintainers.
 - Workarounds using nested queries or Painless scripts, which are less intuitive and harder for users familiar with Splunk/SQL semantics.
 
Do you have any additional context?
This feature would help users migrating from Splunk or other analytics platforms that rely on multivalue field expansion. For reference, see Splunk's mvexpand documentation:
Expands the values of a multivalue field into separate events, one event for each value in the multivalue field. For each result, the mvexpand command creates a new result for every multivalue field.
Example usage in Splunk:
... | mvexpand foo
... | mvexpand foo limit=100
It would be valuable to have similar functionality in OpenSearch SQL to facilitate analytics, dashboarding, and data preparation tasks involving arrays or multivalue fields.
Examples and Edge Cases:
- 
Basic Example
- Input: A document with a field 
tags: ["error", "warning", "info"] - Query: 
SELECT mvexpand(tags) FROM logs - Output:
tags error warning info  
 - Input: A document with a field 
 - 
With Limit Option
- Input: A document with a field 
ids: [1, 2, 3, 4, 5] - Query: 
SELECT mvexpand(ids, limit=3) FROM docs - Output:
ids 1 2 3  
 - Input: A document with a field 
 - 
Multiple Multivalue Fields (Edge Case)
- Input: Document with 
a: [10, 20],b: [100, 200] - Query: Should clarify whether expansion is supported on more than one field at a time. If not, document the limitation.
 
 - Input: Document with 
 - 
Empty or Null Arrays (Edge Case)
- Input: 
tags: []ortags: null - Output: No rows or a row with null value, depending on desired behavior (should be documented).
 
 - Input: 
 - 
Large Arrays / Memory Limits (Edge Case)
- Input: Document with a field containing thousands of values
 - Behavior: Should trigger a warning or error if memory/resource usage exceeds limits, and document how this is handled.
 
 - 
Non-Array Fields (Edge Case)
- Input: 
tags: "error"(not an array) - Output: Should document if the field is left as-is, converted to a single-value array, or an error is thrown.
 
 - Input: 
 
Metadata
Metadata
Assignees
Labels
Type
Projects
Status