[vibe bench] feat: implement LeftSingle join type for scalar subquery decorrelation#20999
[vibe bench] feat: implement LeftSingle join type for scalar subquery decorrelation#20999Dandandan wants to merge 3 commits intoapache:mainfrom
Conversation
Introduce a new LeftSingle join operator that behaves like a left outer join but errors at runtime if more than one right-side row matches any given left-side row. This enforces the SQL scalar subquery invariant (at most one row) at the join level rather than relying on recursive evaluation, enabling O(n) hash-based execution. The scalar_subquery_to_join optimizer rule now emits LeftSingle joins instead of Left joins, and the eliminate_outer_join rule can still convert LeftSingle to Inner when the right side is proven non-nullable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmarks |
- Replace HashSet with bitmap in check_single_match for efficiency - Use DataFusionError::Execution instead of Internal (user-facing error) - Simplify supports_swap to just `!matches!(self, LeftSingle)` - Don't eliminate LeftSingle to Inner (would lose at-most-one constraint) - Use nested match instead of if/else chain in functional_dependencies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
|
|
run benchmarks |
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
|
…gression The eliminate_outer_join optimization is safe for LeftSingle because: - The WHERE clause already filters out NULL (unmatched) rows - The GROUP BY added during scalar subquery decorrelation guarantees at-most-one row per group, so the single-match constraint holds - Without this, LeftSingle is treated like Left for optimization, preventing filter pushdown and causing massive plan regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmarks |
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
|
|
I think this is the main change |
|
"[vibe bench]" 😆 |
|
Probably not worth a new join type at this point (even though it is not that complex one). |
alamb
left a comment
There was a problem hiding this comment.
Somehow I forgot to submit this review
| /// - If a left row has exactly one match on the right: returns the matched pair | ||
| /// - If a left row has no match on the right: returns the left row with NULLs for right columns | ||
| /// - If a left row has more than one match on the right: returns a runtime error | ||
| LeftSingle, |
There was a problem hiding this comment.
At vertica we called this type of join PK join or a Unique join as it assuemes any particular join key is unique
Maybe we can use a similar term here (rather than Single)
LeftUnique 🤔
Introduce a new LeftSingle join operator that behaves like a left outer join but errors at runtime if more than one right-side row matches any given left-side row. This enforces the SQL scalar subquery invariant (at most one row) at the join level rather than relying on recursive evaluation, enabling O(n) hash-based execution.
The scalar_subquery_to_join optimizer rule now emits LeftSingle joins instead of Left joins, and the eliminate_outer_join rule can still convert LeftSingle to Inner when the right side is proven non-nullable.
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?