Skip to content

fix: LazyMemoryExec should produce independent streams per execute()#21565

Merged
viirya merged 1 commit intoapache:mainfrom
viirya:fix-lazy-memory-exec-shared-state
Apr 14, 2026
Merged

fix: LazyMemoryExec should produce independent streams per execute()#21565
viirya merged 1 commit intoapache:mainfrom
viirya:fix-lazy-memory-exec-shared-state

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented Apr 12, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

LazyMemoryExec::execute() shares the same generator instance across multiple calls via Arc::clone, so a second call to execute(0) continues from where the first left off instead of starting from the beginning. This is inconsistent with how other ExecutionPlan implementations behave, where each execute() call produces an independent stream. This was discovered while writing e2e tests for NestedLoopJoinExec memory-limited execution (#21448), where the OOM fallback path re-executes the left child plan and got incomplete results.

What changes are included in this PR?

LazyMemoryExec::execute() was sharing the same generator instance (via Arc::clone) across multiple calls, causing streams to share mutable state. This meant a second call to execute(0) would continue from where the first call left off, instead of starting from the beginning.

Fix by calling reset_state() on the generator to create a fresh instance for each execute() call, matching the expected ExecutionPlan semantics that each execute() produces an independent stream.

Are these changes tested?

Unit test

Are there any user-facing changes?

No

LazyMemoryExec::execute() was sharing the same generator instance
(via Arc::clone) across multiple calls, causing streams to share
mutable state. This meant a second call to execute(0) would continue
from where the first call left off, instead of starting from the
beginning.

Fix by calling reset_state() on the generator to create a fresh
instance for each execute() call, matching the expected ExecutionPlan
semantics that each execute() produces an independent stream.

Co-authored-by: Isaac
@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Apr 12, 2026
Copy link
Copy Markdown
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@viirya viirya added this pull request to the merge queue Apr 14, 2026
@viirya
Copy link
Copy Markdown
Member Author

viirya commented Apr 14, 2026

Thanks @2010YOUY01

Merged via the queue into apache:main with commit f1c643a Apr 14, 2026
36 checks passed
@viirya viirya deleted the fix-lazy-memory-exec-shared-state branch April 14, 2026 02:40
coderfender pushed a commit to coderfender/datafusion that referenced this pull request Apr 14, 2026
…pache#21565)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

LazyMemoryExec::execute() shares the same generator instance across
multiple calls via Arc::clone, so a second call to execute(0) continues
from where the first left off instead of starting from the beginning.
This is inconsistent with how other ExecutionPlan implementations
behave, where each execute() call produces an independent stream. This
was discovered while writing e2e tests for NestedLoopJoinExec
memory-limited execution (apache#21448), where the OOM fallback path
re-executes the left child plan and got incomplete results.

## What changes are included in this PR?

LazyMemoryExec::execute() was sharing the same generator instance (via
Arc::clone) across multiple calls, causing streams to share mutable
state. This meant a second call to execute(0) would continue from where
the first call left off, instead of starting from the beginning.

Fix by calling reset_state() on the generator to create a fresh instance
for each execute() call, matching the expected ExecutionPlan semantics
that each execute() produces an independent stream.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Unit test

## Are there any user-facing changes?

No

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants