Skip to content

Conversation

zhangfengcdt
Copy link
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

  • Yes, and the PR name follows the format [GH-XXX] my subject. Closes #<issue_number>

What changes were proposed in this PR?

Implements a new barrier UDF function to prevent filter pushdown and control predicate evaluation order in complex spatial joins, particularly KNN joins where evaluation order affects query semantics.

  • Adds Barrier expression with boolean parser supporting comparison operators (=, !=, <>, <, <=, >, >=), logical operators (AND, OR, NOT), and parentheses
  • Registers barrier function in Sedona SQL catalog
  • Provides Scala and Python API wrappers
  • Includes comprehensive test coverage with KNN spatial join scenarios
  • Documents usage in NearestNeighbourSearching.md

In KNN spatial joins, filter placement changes query semantics:

  • Filter before KNN: "What are the K nearest high-rated restaurants?"
  • Filter after KNN: "Of the K nearest restaurants, which are high-rated?"

Query optimizers may push filters down, changing intended semantics. The barrier function prevents this by
evaluating expressions at runtime.

Usage Example

SELECT h.name, r.name, r.rating
FROM hotels h
JOIN restaurants r ON ST_KNN(h.geometry, r.geometry, 3, false)
WHERE barrier('rating > 4.0 AND stars >= 4',
              'rating', r.rating,
              'stars', h.stars)

How was this patch tested?

  • All existing tests pass
  • 11 new barrier function tests covering basic functionality and KNN scenarios
  • Python API tests
  • Documentation updated

Did this PR include necessary documentation updates?

  • Yes, I am adding a new API. I am using the current SNAPSHOT version number in vX.Y.Z format.
  • Yes, I have updated the documentation.
  • No, this PR does not affect any public API so no need to change the documentation.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new barrier UDF function to prevent filter pushdown and control predicate evaluation order in spatial joins, particularly KNN joins where evaluation order affects query semantics.

  • Adds a Barrier expression with boolean parser supporting comparison, logical, and grouping operators
  • Registers the barrier function in Sedona SQL catalog and provides Scala/Python API wrappers
  • Removes filter pushdown warnings from KNN join query detector since barrier provides a better solution

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
BarrierFunction.scala Core implementation of the barrier expression with boolean expression parser
st_functions.scala Scala API wrappers for the barrier function
Catalog.scala Registers barrier function in Sedona SQL catalog
st_functions.py Python API wrapper for the barrier function
BarrierFunctionTest.scala Comprehensive test coverage for barrier function functionality
JoinQueryDetector.scala Removes filter pushdown warning logic replaced by barrier function
NearestNeighbourSearching.md Documentation on using the barrier function with KNN joins

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@zhangfengcdt zhangfengcdt marked this pull request as ready for review September 18, 2025 16:07
@jiayuasu jiayuasu linked an issue Sep 19, 2025 that may be closed by this pull request
@jiayuasu jiayuasu added this to the sedona-1.8.1 milestone Sep 19, 2025
@jiayuasu jiayuasu merged commit c54d144 into apache:master Sep 19, 2025
41 checks passed
@Kontinuation
Copy link
Member

Late LGTM. However, I still have concerns about the performance overhead of barrier evaluation. We can leave this as a future optimization task.

We are parsing and interpreting the barrier expression each time we evaluate a row, this will probably be a lot slower than the native speed of the query engine. We may utilize Spark SQL infrastructures such as UnsafeProjection to reduce the performance overhead.

@jiayuasu
Copy link
Member

Thanks for the suggestion! @zhangfengcdt let's fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement barrier udf in sedona
3 participants