Skip to content

Panic when having right anti join with filter in sort_merge_join and different number of columns between each side #18787

@rluvaton

Description

@rluvaton

Created a PR just for adding fuzz tests:

Describe the bug

Having panic:

thread 'tokio-runtime-worker' panicked at /Users/rluvaton/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/record_batch.rs:609:22:
index out of bounds: the len is 2 but the index is 2
stack backtrace:
   0: __rustc::rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic_bounds_check
   3: arrow_array::record_batch::RecordBatch::column
   4: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   5: arrow_select::concat::concat_batches
   6: <datafusion_physical_plan::joins::sort_merge_join::stream::SortMergeJoinStream as futures_core::stream::Stream>::poll_next
   7: datafusion_common_runtime::trace_utils::trace_future::{{closure}}
   8: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
   9: tokio::runtime::task::core::Core<T,S>::poll
  10: tokio::runtime::task::harness::Harness<T,S>::poll
  11: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  12: tokio::runtime::scheduler::multi_thread::worker::Context::run
  13: tokio::runtime::context::scoped::Scoped<T>::set
  14: tokio::runtime::context::runtime::enter_runtime
  15: tokio::runtime::scheduler::multi_thread::worker::run
  16: tokio::runtime::task::core::Core<T,S>::poll
  17: tokio::runtime::task::harness::Harness<T,S>::poll
  18: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

To Reproduce

$ RUST_BACKTRACE=1 datafusion-cli
DataFusion CLI v50.3.0
> set datafusion.optimizer.prefer_hash_join = false;
0 row(s) fetched.
Elapsed 0.001 seconds.

> select * from (
with
t1 as (
    select 31 a, 32 b union all
    select 31 a, 33 b
),
t2 as (
    select 31 a, 32 b, 108 c union all
    select 31 a, 35 b, 109 c
)
select t2.* from t1 right anti join t2 on t1.a = t2.a and t1.b = t2.b and t1.b <= t2.c
) order by 1, 2;


thread 'tokio-runtime-worker' panicked at /Users/rluvaton/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/record_batch.rs:609:22:
index out of bounds: the len is 2 but the index is 2
stack backtrace:
...

Expected behavior

to not panic

Additional context

While reviewing the PR below

I checked about filter in sort merge join with different amount of columns for other join types and found this, so thank you @tglanz

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions