-
Notifications
You must be signed in to change notification settings - Fork 177
Description
Problem
The current implementation of reverse in PPL uses a ROW_NUMBER window function to invert row order, which is not pushed down to the data source. This results in local execution that can be costly on large datasets.
Without pushdown, queries using reverse on large datasets may suffer performance degradation due to local sorting and window function computation.
For example, the query source=big5 | reverse | head 10
fails because reverse is applied on the entire big5 dataset before head 10 is applied, so the dataset is too big to be reversed.
Goals
- Detect logical plan patterns where pushdown is possible.
- Flip sort direction in pushdown when applicable to implement reverse.
- Double reverses or 2 Consecutive reverses should be a no-op.
- Disallow or fallback gracefully if conditions are not met.
- Add configuration flags or warnings for use on large unbounded datasets.
reverse can be pushed down when there are preceding sort, with no filter and limit in between.
In other cases, reverse cannot be pushed down with the current implementation, since there isn't a reverse_row_number in the index, unless we push down the window function as well.
Alternatives
- Tried sort by desc on _id but there may be performance issues on this field since the default is sort by asc on _id
- Can try sort by desc on _doc
Additional context
RFC to implement reverse: #3873
Reverse implementation PR: #3867
Metadata
Metadata
Assignees
Labels
Type
Projects
Status