-
Notifications
You must be signed in to change notification settings - Fork 730
[GH-2356] Implement barrier udf function #2357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GH-2356] Implement barrier udf function #2357
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a new barrier
UDF function to prevent filter pushdown and control predicate evaluation order in spatial joins, particularly KNN joins where evaluation order affects query semantics.
- Adds a
Barrier
expression with boolean parser supporting comparison, logical, and grouping operators - Registers the barrier function in Sedona SQL catalog and provides Scala/Python API wrappers
- Removes filter pushdown warnings from KNN join query detector since barrier provides a better solution
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
BarrierFunction.scala | Core implementation of the barrier expression with boolean expression parser |
st_functions.scala | Scala API wrappers for the barrier function |
Catalog.scala | Registers barrier function in Sedona SQL catalog |
st_functions.py | Python API wrapper for the barrier function |
BarrierFunctionTest.scala | Comprehensive test coverage for barrier function functionality |
JoinQueryDetector.scala | Removes filter pushdown warning logic replaced by barrier function |
NearestNeighbourSearching.md | Documentation on using the barrier function with KNN joins |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/BarrierFunction.scala
Outdated
Show resolved
Hide resolved
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/BarrierFunction.scala
Outdated
Show resolved
Hide resolved
spark/common/src/test/scala/org/apache/sedona/sql/BarrierFunctionTest.scala
Outdated
Show resolved
Hide resolved
Late LGTM. However, I still have concerns about the performance overhead of barrier evaluation. We can leave this as a future optimization task. We are parsing and interpreting the barrier expression each time we evaluate a row, this will probably be a lot slower than the native speed of the query engine. We may utilize Spark SQL infrastructures such as |
Thanks for the suggestion! @zhangfengcdt let's fix it. |
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject
. Closes #<issue_number>What changes were proposed in this PR?
Implements a new
barrier
UDF function to prevent filter pushdown and control predicate evaluation order in complex spatial joins, particularly KNN joins where evaluation order affects query semantics.Barrier
expression with boolean parser supporting comparison operators (=, !=, <>, <, <=, >, >=), logical operators (AND, OR, NOT), and parenthesesIn KNN spatial joins, filter placement changes query semantics:
Query optimizers may push filters down, changing intended semantics. The barrier function prevents this by
evaluating expressions at runtime.
Usage Example
How was this patch tested?
Did this PR include necessary documentation updates?
vX.Y.Z
format.