Skip to content

Simplify logic for memory pressure partial emit from ordered group by#20559

Open
alamb wants to merge 6 commits intoapache:mainfrom
alamb:alamb/simpler_partial_emit
Open

Simplify logic for memory pressure partial emit from ordered group by#20559
alamb wants to merge 6 commits intoapache:mainfrom
alamb:alamb/simpler_partial_emit

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Feb 25, 2026

Which issue does this PR close?

Rationale for this change

I found the formulation of the fix in #20446 hard to follow (see #20446 (review) for details).

Basically the meaning of emit_to and 0 are inverted in this case.

What changes are included in this PR?

Pull the logic of what to emit into its own function with more comments that I think make it clearer what is going on

Are these changes tested?

Yes by existing tests

Are there any user-facing changes?

@alamb alamb marked this pull request as ready for review February 26, 2026 13:55
Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Left a non-blocking comment.

Comment on lines +1046 to +1053
/// Returns the number of groups groups that can be emitted to avoid an
/// out-of-memory condition.
///
/// Returns `None` if emitting is not possible.
///
/// Returns `Some(EmitTo)` with the number of groups to emit if it is possible
/// to emit some groups to free memory.
fn emit_target_for_oom(&self) -> Option<EmitTo> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emit_target_for_oom is a nice extraction, but it mixes two different concerns:

  • the generic OOM‑batching policy (“how many rows would I like to emit?”) and
  • the GroupOrdering rules (“what emit strategies are legal given the current ordering?” – None has its own special case).

I think renaming/doc‑tweaking the helper (e.g. oom_emit_to / “returns the emit strategy to use under OOM”) would clarify this.

Alternatively, move the ordering‑aware clamping logic onto GroupOrdering (or into aggregates/order/mod.rs).
That way row_hash.rs just decides how much it wants to emit under memory pressure, and the ordering module enforces its invariants – which keeps the rules in one place and makes the helper reusable for other emit paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good insight -- I moved the code (well had codex move the code) into GroupValues and I think it is a much nicer encapsulation of concerns. It also is easier to unit test

/// while respecting the current ordering guarantees.
///
/// Returns `None` if no data can be emitted.
pub fn oom_emit_to(&self, n: usize) -> Option<EmitTo> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite a bit nicer API now (thanks @kosiew for the shout out)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants