Support MySQL in-source parallel deserialization #4122
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introducing AsyncScheduler with PartitionedDeserializationScheduler
🚀 Core Architecture
Intelligent Event Routing: Leverages primary key-based partitioning to distribute data-change events across dedicated single-thread partition workers, ensuring optimal load distribution and thread safety.
Robust Queue Management: Implements bounded per-partition queues using
ArrayBlockingQueue
with a strict blocking policy, guaranteeing zero data loss through our no-drop mechanism.Ordering Guarantees: The source thread's
drainRound
serves as the exclusive collector, maintaining per-key ordering integrity throughout the entire processing pipeline.Reliable Checkpoint Recovery: Binds binlog offset advancement to post-emit operations, ensuring bulletproof checkpoint consistency and seamless recovery capabilities.
🌐 Global Async Control Path
Comprehensive Event Support: Seamlessly handles control events (DDL/Watermark/Heartbeat) through a sophisticated global async pathway.
Advanced Completion Management: Utilizes sequence-to-batch mapping for intelligent out-of-order completion handling while maintaining strict in-order emission guarantees.
⚙️ Flexible Configuration Framework
Builder-Driven Design: Offers comprehensive configuration control through intuitive builder patterns:
parallelDeserializeEnabled
- Master switch (disabled by default)parallelDeserializePkWorkers
- Primary key worker scalingparallelDeserializeThreads
- Thread pool optimizationparallelDeserializeQueueCapacity
- Queue size tuningBackward Compatibility: When disabled, the system gracefully falls back to the original single-thread execution path with zero scheduler overhead.
🛡️ Enhanced Stability & Performance
Concurrency Safety: Resolves potential
ConcurrentModificationException
issues during emission by implementing snapshot-to-array conversion before iteration, eliminating iterator modification conflicts.Code Excellence: Features comprehensive Chinese comments, streamlined code paths, and removal of legacy JVM system properties for improved maintainability.
📋 Migration Guide
Activation Steps:
parallelDeserializeEnabled=true
pkWorkers
,threads
, andqueue
parametersSeamless Transition: When disabled, behavior remains 100% identical to previous versions, ensuring zero-risk deployment.
Integration Requirements: Ensure downstream systems respect per-key ordering constraints; all collection operations remain anchored to the source thread for consistency.