fix(sql): fix a bug when planning semi- or antijoins by aalexandrov · Pull Request #20990 · apache/datafusion

aalexandrov · 2026-03-17T13:02:10Z

Closes exclude_using_columns might wrongly retain columns from a projected join input #20989.

Rationale for this change

The planner should be consistent with the expected SQL behavior—swapping the names of tables that have identical structure in a SQL query should not affect the schema for that query.

What changes are included in this PR?

A fix in the exclude_using_columns helper method in datafusion/expr/src/utils.rs that ensures that we don't retain columns from the projected side when deciding which USING columns to exclude and which to retain on top of semi- or antijoins.
Regression tests for the change in test_using_join_wildcard_schema_semi_anti.

Are these changes tested?

Added a regression test.

Are there any user-facing changes?

Yes, the change is user facing, but I doubt that this behavior is expected and is documented anywhere.
If existing docs need to be updated, please point me to the concrete places and I can take a look.

Currently, the `exclude_using_columns` called from `expand_wildcard` doesn't consider the filtering semantics of semi- and antijoins when expanding wildcards on top of joins defined via `USING(<columns>)` syntax. From each set of columns equated by a `USING(<column>)` expression, the code currently (1) sorts the set entries, and (2) retains only the first entry from each set. Because of that, the columns surviving the `exclude_using_columns` call might be wrongly chosen from the filtering side if the table qualifier from that side is lexicographically before the filtered side qualifier. For example, given this schema of two identical tables: ```sql create table s(x1 int, x2 int, x3 int); create table t(x1 int, x2 int, x3 int); ``` One would expect that the schema of queries where the `s` and `t` names are swapped will be identical. However, currently this is not the case: ```sql -- Q1 schema: x1 int, x2 int, x3 int (because s < t) select * from s left semi join t using (x1); -- Q2 schema: x2 int, x3 int (because t < s) select * from t left semi join s using (x1); ``` This commit fixes the issue and adds some regression tests.

github-actions bot added sql SQL Planner logical-expr Logical plan and expressions labels Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sql): fix a bug when planning semi- or antijoins#20990

fix(sql): fix a bug when planning semi- or antijoins#20990
aalexandrov wants to merge 1 commit intoapache:mainfrom
aalexandrov:fix_semijoin_using_wildcard_planner_bug

aalexandrov commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aalexandrov commented Mar 17, 2026

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant