Simplify logic for memory pressure partial emit from ordered group by#20559
Simplify logic for memory pressure partial emit from ordered group by#20559alamb wants to merge 6 commits intoapache:mainfrom
Conversation
kosiew
left a comment
There was a problem hiding this comment.
Looks good to me.
Left a non-blocking comment.
| /// Returns the number of groups groups that can be emitted to avoid an | ||
| /// out-of-memory condition. | ||
| /// | ||
| /// Returns `None` if emitting is not possible. | ||
| /// | ||
| /// Returns `Some(EmitTo)` with the number of groups to emit if it is possible | ||
| /// to emit some groups to free memory. | ||
| fn emit_target_for_oom(&self) -> Option<EmitTo> { |
There was a problem hiding this comment.
emit_target_for_oom is a nice extraction, but it mixes two different concerns:
- the generic OOM‑batching policy (“how many rows would I like to emit?”) and
- the GroupOrdering rules (“what emit strategies are legal given the current ordering?” – None has its own special case).
I think renaming/doc‑tweaking the helper (e.g. oom_emit_to / “returns the emit strategy to use under OOM”) would clarify this.
Alternatively, move the ordering‑aware clamping logic onto GroupOrdering (or into aggregates/order/mod.rs).
That way row_hash.rs just decides how much it wants to emit under memory pressure, and the ordering module enforces its invariants – which keeps the rules in one place and makes the helper reusable for other emit paths.
There was a problem hiding this comment.
This is a good insight -- I moved the code (well had codex move the code) into GroupValues and I think it is a much nicer encapsulation of concerns. It also is easier to unit test
| /// while respecting the current ordering guarantees. | ||
| /// | ||
| /// Returns `None` if no data can be emitted. | ||
| pub fn oom_emit_to(&self, n: usize) -> Option<EmitTo> { |
There was a problem hiding this comment.
this is quite a bit nicer API now (thanks @kosiew for the shout out)
Which issue does this PR close?
GroupOrderingPartial::remove_groupswhen Partial aggregate with PartiallySorted hits memory pressure #20445Rationale for this change
I found the formulation of the fix in #20446 hard to follow (see #20446 (review) for details).
Basically the meaning of emit_to and 0 are inverted in this case.
What changes are included in this PR?
Pull the logic of what to emit into its own function with more comments that I think make it clearer what is going on
Are these changes tested?
Yes by existing tests
Are there any user-facing changes?