Skip to content

Improve eventBroker Scan Strategy with Memory-Aware Scan Window Algorithm #4172

@asddongmen

Description

@asddongmen

Problem

Currently, the eventBroker easily scans too many events for a single dispatcher, which quickly exhausts the memory quota. This leads to several issues:

  1. Dispatcher Starvation: In DDL or syncpoint scenarios, certain dispatchers can be starved because one dispatcher monopolizes the available memory quota while others wait indefinitely.
  2. Frequent Reset Events: Under high throughput workloads, the memory quota is frequently hit, triggering reset events. This causes non-smooth synchronization and degrades overall performance.
  3. Unbalanced Event Distribution: The eventBroker cannot fairly distribute scanning resources across dispatchers.

Proposed Solution

Introduce a memory-aware scan window algorithm that dynamically adjusts the scan interval based on memory quota watermark. The key components include:

  1. Sliding Window Memory Monitoring: Track memory usage samples over a configurable time window (e.g., 30 seconds) to compute average, max, and trend statistics.
  2. Tiered Response Thresholds:
    • Critical (>90%): Aggressively reduce scan interval to 1/4
    • High (>70%): Reduce scan interval to 1/2
    • Trend Damping (>30% and rising): Proactively reduce by 10%
    • Low (<20%): Increase scan interval by 25%
    • Very Low (<10%): Increase scan interval by 50%
  3. "Fast Brake, Slow Accelerate" Policy:
    • Decreases are applied immediately when memory pressure rises
    • Increases require cooldown periods and stable conditions to prevent oscillation
  4. Scan Window Coordination: Use a base timestamp (minSentTs) combined with the dynamic scan interval to compute the maximum timestamp (scanMaxTs) for each scan operation, ensuring dispatchers progress together.

Expected Benefits

  • Eliminate Dispatcher Starvation: All dispatchers get fair access to scanning resources
  • Reduce Reset Events: Memory quota is managed proactively, avoiding sudden exhaustion
  • Smoother Synchronization: Consistent throughput without memory pressure spikes
  • Better DDL/Syncpoint Handling: Critical operations complete without being blocked by memory exhaustion

Metadata

Metadata

Assignees

Labels

type/enhancementThe issue or PR belongs to an enhancement.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions