Skip to content

feat: As a user, I want error ratio-based circuit breaking in api-breaker plugin, so that I can have more intelligent circuit breaking based on error rates instead of just failure counts #12763

@HaoTien

Description

@HaoTien

Description

feat: As a user, I want error ratio-based circuit breaking in api-breaker plugin, so that I can have more intelligent circuit breaking based on error rates instead of just failure counts

Description

Currently, the api-breaker plugin only supports failure count-based circuit breaking (unhealthy-count policy), which triggers circuit breaker when consecutive failure count reaches a threshold. This approach may not be suitable for all scenarios, especially when dealing with varying traffic patterns.

I would like to propose adding an error ratio-based circuit breaking policy (unhealthy-ratio) that triggers circuit breaker based on error rate within a sliding time window, providing more intelligent and adaptive circuit breaking behavior.

Motivation

Current Limitations

  • The existing failure count-based approach only considers consecutive failures
  • It doesn't account for the overall error rate in relation to total requests
  • May be too sensitive during low traffic periods or not sensitive enough during high traffic periods

Benefits of Error Ratio-based Circuit Breaking

  • More accurate representation of service health by considering error rate rather than just failure count
  • Better handling of varying traffic patterns
  • Configurable sliding time window for flexible error rate calculation
  • Support for circuit breaker states: CLOSED, OPEN, and HALF_OPEN

Proposed Solution

Add a new policy parameter to the api-breaker plugin with two options:

  • unhealthy-count (default, existing behavior)
  • unhealthy-ratio (new error ratio-based policy)

New Configuration Parameters for unhealthy-ratio Policy

Parameter Type Default Description
policy string "unhealthy-count" Circuit breaker policy
unhealthy.error_ratio number 0.5 Error rate threshold (0-1) to trigger circuit breaker
unhealthy.min_request_threshold integer 10 Minimum requests needed before evaluating error rate
unhealthy.sliding_window_size integer 300 Sliding window size in seconds for error rate calculation
unhealthy.permitted_number_of_calls_in_half_open_state integer 3 Number of permitted calls in half-open state
healthy.success_ratio number 0.6 Success rate threshold to close circuit breaker from half-open state

Example Configuration

{
  "plugins": {
    "api-breaker": {
      "break_response_code": 503,
      "policy": "unhealthy-ratio",
      "max_breaker_sec": 60,
      "unhealthy": {
        "http_statuses": [500, 502, 503, 504],
        "error_ratio": 0.5,
        "min_request_threshold": 10,
        "sliding_window_size": 300,
        "permitted_number_of_calls_in_half_open_state": 3
      },
      "healthy": {
        "http_statuses": [200, 201, 202],
        "success_ratio": 0.6
      }
    }
  }
}

Implementation Details

Circuit Breaker States

  • CLOSED: Normal request forwarding
  • OPEN: Direct circuit breaker response without forwarding requests
  • HALF_OPEN: Limited requests allowed to test service recovery

Algorithm

  1. Track requests and errors within a sliding time window
  2. When request count ≥ min_request_threshold and error rate ≥ error_ratio, open circuit breaker
  3. After max_breaker_sec, transition to half-open state
  4. In half-open state, allow up to permitted_number_of_calls_in_half_open_state requests
  5. If sufficient successful requests, close circuit breaker; otherwise, reopen

Backward Compatibility

This enhancement is fully backward compatible:

  • Existing configurations continue to work without changes
  • Default policy is "unhealthy-count" (existing behavior)
  • No breaking changes to existing APIs

Testing

Comprehensive test coverage will be provided including:

  • Schema validation tests for new parameters
  • Functional tests for error ratio calculation
  • Circuit breaker state transition tests
  • Integration tests with various traffic patterns
  • Backward compatibility tests

Use Cases

  1. High-traffic services: Better handling of error spikes in high-volume scenarios
  2. Variable traffic patterns: Adaptive behavior for services with fluctuating request rates
  3. Microservices architectures: More precise circuit breaking for service mesh environments
  4. SLA-based circuit breaking: Configure circuit breaker based on acceptable error rates

Files to be Modified

  • apisix/plugins/api-breaker.lua - Core plugin logic
  • t/plugin/api-breaker.t - Test cases (new test file for ratio-based tests)
  • docs/en/latest/plugins/api-breaker.md - English documentation
  • docs/zh/latest/plugins/api-breaker.md - Chinese documentation

Additional Information

This feature has been implemented and tested locally. I'm ready to submit a PR with:

  • Complete implementation of the error ratio-based circuit breaking
  • Comprehensive test suite following APISIX testing standards
  • Updated documentation in both English and Chinese
  • Backward compatibility preservation

Would appreciate feedback on this proposal and guidance on the contribution process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions