-
Notifications
You must be signed in to change notification settings - Fork 177
Description
Problem Statement:
Currently, OpenSearch PPL lacks a direct string replacement capability, requiring users to use complex workarounds or multiple commands to achieve simple string replacements. This limitation impacts data processing efficiency and user experience when performing text transformations.
Current State
Currently, OpenSearch PPL does not have a direct "replace" command in its standard command set. However, we can achieve string replacement functionality using the eval command with string functions.
source=myindex
| eval new_field = replace(field_name, 'text_to_find', 'text_to_replace')
Proposed Goal
Implement a PPL 'replace' command that efficiently transforms field values in search results through string replacement operations, while maintaining data integrity and optimal performance in OpenSearch.
In Scope:
-
Replace field values in search results with specified values
- Support single and multiple field replacements
- Handle case-sensitive and case-insensitive replacements
- Process wildcard patterns
- Support multiple replacement pairs
-
Field Processing
- Replace values in original fields only
- Skip generated fields (from stats/eval)
- Support default behavior for unspecified fields
-
Functionality
- String value replacement
- Pattern-based matching
- Field-specific operations
- Streaming data processing
Out of Scope:
- Replacement in generated fields (stats/eval)
- Regular expression support (initial phase)
- Dynamic value generation
- Complex transformations
Proposal:
Core Features to Implement:
Basic syntax:
source=index | replace <string_to_replace> WITH <replacement_string>
With field specification
source=index | replace <string_to_replace> WITH <replacement_string> IN field1, field2
Multiple replacements
source=index | replace
<string1> WITH <replacement1>
<string2> WITH <replacement2>
IN field1, field2
Examples:
# 1. Basic String Replacement
source=logs
| replace "error" WITH "ERROR"
# 2. Field-Specific Replacement
source=logs
| replace "error" WITH "ERROR" IN message, log_level
# 3. Multiple Replacements
source=logs
| replace
"error" WITH "ERROR"
"warning" WITH "WARNING"
"info" WITH "INFO"
# 4. Case Sensitivity Option
source=logs
| replace CASE-SENSITIVE "Error" WITH "ERROR"
# 5. Wildcard Support
source=logs
| replace *error* WITH "ERROR"
# 6. Multiple Field Support
source=logs
| replace "error" WITH "ERROR" IN field1, field2, field3
Feature Specification:
Command Properties:
name: replace
type: transformation
input: streaming
output: streaming
Parameters:
- pattern: string/wildcard
- replacement: string
- fields: array[string]
- case_sensitive: boolean
- preserve_original: boolean
Options:
- CASE-SENSITIVE
- IN (field specification)
- WITH (replacement specification)
Implementation Approach
Logical Flow:
1. Input Stream
2. Field Selection
3. Pattern Matching
4. Replacement Application
5. Output Stream
Data Flow:
1. Extract field values
2. Apply patterns
3. Perform replacements
4. Update fields
Logical Implementation Examples
-- Simple replacement
| replace "error" WITH "ERROR"
→ ReplaceCommand.builder()
.addReplacement("error", "ERROR")
.build()
-- Multiple fields
| replace "error" WITH "ERROR" IN field1, field2
→ ReplaceCommand.builder()
.addReplacement("error", "ERROR")
.addFields("field1", "field2")
.build()
-- Case sensitive
| replace CASE-SENSITIVE "Error" WITH "ERROR"
→ ReplaceCommand.builder()
.addReplacement("Error", "ERROR")
.setCaseSensitive(true)
.build()
Metadata
Metadata
Assignees
Labels
Type
Projects
Status