Skip to content

[feat][cp][DynamicMerge] Add Async SpillMerger in LocalMerge#418

Open
guhaiyan0221 wants to merge 1 commit intobytedance:mainfrom
guhaiyan0221:fix_cp_async_spill_merger
Open

[feat][cp][DynamicMerge] Add Async SpillMerger in LocalMerge#418
guhaiyan0221 wants to merge 1 commit intobytedance:mainfrom
guhaiyan0221:fix_cp_async_spill_merger

Conversation

@guhaiyan0221
Copy link
Collaborator

What problem does this PR solve?

Issue Number: close #191

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Summary:
The current SpillMerger is single-thread sync mode using SpillMergeStream. It is beneficial to use multithreaded async mode to parallel the processing of merging on the consumer side and deserialization, and IO on the producer side. Create an async producer for each merge source using futures and callbacks, and execute them using the folly executor of the SpillConfig. Unify the consumer side by using SourceStream and LocalMergeSource in both SourceMerger and SpillMerger.

Part of facebookincubator/velox#13260

Corresponding PR: facebookincubator/velox#13634

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).
  • Positive Impact: I have run benchmarks.
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Fixed a crash in `substr` when input is null.
- optimized `group by` performance by 20%.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

Summary:
The current `SpillMerger` is single-thread sync mode using `SpillMergeStream`.
It is beneficial to use multithreaded async mode to parallel the processing of
merging on the consumer side and deserialization, and IO on the producer side.
Create an async producer for each merge source using futures and callbacks, and
execute them using the folly executor of the `SpillConfig`. Unify the consumer side
by using `SourceStream` and `LocalMergeSource` in both `SourceMerger` and `SpillMerger`.

Part of facebookincubator/velox#13260

Corresponding PR: facebookincubator/velox#13634
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants