Skip to content

Add native support for mvexpand command in PPL #4439

@srikanthpadakanti

Description

@srikanthpadakanti

Is your feature request related to a problem?
Currently, OpenSearch SQL does not provide a native way to expand multivalue fields into separate rows/events, similar to Splunk's mvexpand command. This limits the ability to efficiently work with and analyze multivalue fields directly within queries, which is a common use case in log analytics and data exploration.

What solution would you like?
Introduce an mvexpand command (or equivalent SQL function/operator) in OpenSearch SQL that:

  • Expands a multivalue field into separate rows, one per value in the field, for each input event/document.
  • Supports an optional limit parameter to restrict how many values per event/document are expanded (similar to mvexpand <field> [limit=<int>] in Splunk).
  • Handles memory/resource limits similar to Splunk's RAM usage (with configurable limits and clear error messaging if limits are exceeded).
  • Works as a streaming/distributable command for performance at scale.

What alternatives have you considered?

  • Manual preprocessing of documents to flatten multivalue fields before indexing, which is inflexible and not always possible.
  • Using scripting or custom plugins, which increases complexity for users and maintainers.
  • Workarounds using nested queries or Painless scripts, which are less intuitive and harder for users familiar with Splunk/SQL semantics.

Do you have any additional context?
This feature would help users migrating from Splunk or other analytics platforms that rely on multivalue field expansion. For reference, see Splunk's mvexpand documentation:

Expands the values of a multivalue field into separate events, one event for each value in the multivalue field. For each result, the mvexpand command creates a new result for every multivalue field.

Example usage in Splunk:

... | mvexpand foo
... | mvexpand foo limit=100

It would be valuable to have similar functionality in OpenSearch SQL to facilitate analytics, dashboarding, and data preparation tasks involving arrays or multivalue fields.


Examples and Edge Cases:

  1. Basic Example

    • Input: A document with a field tags: ["error", "warning", "info"]
    • Query: SELECT mvexpand(tags) FROM logs
    • Output:
      tags
      error
      warning
      info
  2. With Limit Option

    • Input: A document with a field ids: [1, 2, 3, 4, 5]
    • Query: SELECT mvexpand(ids, limit=3) FROM docs
    • Output:
      ids
      1
      2
      3
  3. Multiple Multivalue Fields (Edge Case)

    • Input: Document with a: [10, 20], b: [100, 200]
    • Query: Should clarify whether expansion is supported on more than one field at a time. If not, document the limitation.
  4. Empty or Null Arrays (Edge Case)

    • Input: tags: [] or tags: null
    • Output: No rows or a row with null value, depending on desired behavior (should be documented).
  5. Large Arrays / Memory Limits (Edge Case)

    • Input: Document with a field containing thousands of values
    • Behavior: Should trigger a warning or error if memory/resource usage exceeds limits, and document how this is handled.
  6. Non-Array Fields (Edge Case)

    • Input: tags: "error" (not an array)
    • Output: Should document if the field is left as-is, converted to a single-value array, or an error is thrown.

Metadata

Metadata

Labels

PPLPiped processing languageenhancementNew feature or request

Type

No type

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions