Skip to content

[RFC] Add replace Command to PPL (Calcite Engine) #3975

@manasvinibs

Description

@manasvinibs

Problem Statement:

Currently, OpenSearch PPL lacks a direct string replacement capability, requiring users to use complex workarounds or multiple commands to achieve simple string replacements. This limitation impacts data processing efficiency and user experience when performing text transformations.

Current State

Currently, OpenSearch PPL does not have a direct "replace" command in its standard command set. However, we can achieve string replacement functionality using the eval command with string functions.

source=myindex
| eval new_field = replace(field_name, 'text_to_find', 'text_to_replace')

Proposed Goal

Implement a PPL 'replace' command that efficiently transforms field values in search results through string replacement operations, while maintaining data integrity and optimal performance in OpenSearch.

In Scope:

  1. Replace field values in search results with specified values

    • Support single and multiple field replacements
    • Handle case-sensitive and case-insensitive replacements
    • Process wildcard patterns
    • Support multiple replacement pairs
  2. Field Processing

    • Replace values in original fields only
    • Skip generated fields (from stats/eval)
    • Support default behavior for unspecified fields
  3. Functionality

    • String value replacement
    • Pattern-based matching
    • Field-specific operations
    • Streaming data processing

Out of Scope:

  • Replacement in generated fields (stats/eval)
  • Regular expression support (initial phase)
  • Dynamic value generation
  • Complex transformations

Proposal:

Core Features to Implement:

Basic syntax:

source=index | replace <string_to_replace> WITH <replacement_string>

With field specification

source=index | replace <string_to_replace> WITH <replacement_string> IN field1, field2

Multiple replacements

source=index | replace 
    <string1> WITH <replacement1>
    <string2> WITH <replacement2> 
    IN field1, field2

Examples:


# 1. Basic String Replacement
source=logs 
| replace "error" WITH "ERROR"

# 2. Field-Specific Replacement
source=logs 
| replace "error" WITH "ERROR" IN message, log_level

# 3. Multiple Replacements
source=logs 
| replace 
    "error" WITH "ERROR"
    "warning" WITH "WARNING"
    "info" WITH "INFO"

# 4. Case Sensitivity Option
source=logs 
| replace CASE-SENSITIVE "Error" WITH "ERROR"

# 5. Wildcard Support
source=logs 
| replace *error* WITH "ERROR"

# 6. Multiple Field Support
source=logs 
| replace "error" WITH "ERROR" IN field1, field2, field3

Feature Specification:

Command Properties:
  name: replace
  type: transformation
  input: streaming
  output: streaming

Parameters:
  - pattern: string/wildcard
  - replacement: string
  - fields: array[string]
  - case_sensitive: boolean
  - preserve_original: boolean

Options:
  - CASE-SENSITIVE
  - IN (field specification)
  - WITH (replacement specification)

Implementation Approach

Logical Flow:

1. Input Stream
2. Field Selection
3. Pattern Matching
4. Replacement Application
5. Output Stream

Data Flow:

 1. Extract field values
 2. Apply patterns
 3. Perform replacements
 4. Update fields

Logical Implementation Examples

-- Simple replacement
| replace "error" WITH "ERROR"
→ ReplaceCommand.builder()
    .addReplacement("error", "ERROR")
    .build()

-- Multiple fields
| replace "error" WITH "ERROR" IN field1, field2
→ ReplaceCommand.builder()
    .addReplacement("error", "ERROR")
    .addFields("field1", "field2")
    .build()

-- Case sensitive
| replace CASE-SENSITIVE "Error" WITH "ERROR"
→ ReplaceCommand.builder()
    .addReplacement("Error", "ERROR")
    .setCaseSensitive(true)
    .build()

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageRFCRequest For Comments

Type

No type

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions