-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Description
feat: As a user, I want error ratio-based circuit breaking in api-breaker plugin, so that I can have more intelligent circuit breaking based on error rates instead of just failure counts
Description
Currently, the api-breaker plugin only supports failure count-based circuit breaking (unhealthy-count policy), which triggers circuit breaker when consecutive failure count reaches a threshold. This approach may not be suitable for all scenarios, especially when dealing with varying traffic patterns.
I would like to propose adding an error ratio-based circuit breaking policy (unhealthy-ratio) that triggers circuit breaker based on error rate within a sliding time window, providing more intelligent and adaptive circuit breaking behavior.
Motivation
Current Limitations
- The existing failure count-based approach only considers consecutive failures
- It doesn't account for the overall error rate in relation to total requests
- May be too sensitive during low traffic periods or not sensitive enough during high traffic periods
Benefits of Error Ratio-based Circuit Breaking
- More accurate representation of service health by considering error rate rather than just failure count
- Better handling of varying traffic patterns
- Configurable sliding time window for flexible error rate calculation
- Support for circuit breaker states: CLOSED, OPEN, and HALF_OPEN
Proposed Solution
Add a new policy parameter to the api-breaker plugin with two options:
unhealthy-count(default, existing behavior)unhealthy-ratio(new error ratio-based policy)
New Configuration Parameters for unhealthy-ratio Policy
| Parameter | Type | Default | Description |
|---|---|---|---|
policy |
string | "unhealthy-count" |
Circuit breaker policy |
unhealthy.error_ratio |
number | 0.5 |
Error rate threshold (0-1) to trigger circuit breaker |
unhealthy.min_request_threshold |
integer | 10 |
Minimum requests needed before evaluating error rate |
unhealthy.sliding_window_size |
integer | 300 |
Sliding window size in seconds for error rate calculation |
unhealthy.permitted_number_of_calls_in_half_open_state |
integer | 3 |
Number of permitted calls in half-open state |
healthy.success_ratio |
number | 0.6 |
Success rate threshold to close circuit breaker from half-open state |
Example Configuration
{
"plugins": {
"api-breaker": {
"break_response_code": 503,
"policy": "unhealthy-ratio",
"max_breaker_sec": 60,
"unhealthy": {
"http_statuses": [500, 502, 503, 504],
"error_ratio": 0.5,
"min_request_threshold": 10,
"sliding_window_size": 300,
"permitted_number_of_calls_in_half_open_state": 3
},
"healthy": {
"http_statuses": [200, 201, 202],
"success_ratio": 0.6
}
}
}
}Implementation Details
Circuit Breaker States
- CLOSED: Normal request forwarding
- OPEN: Direct circuit breaker response without forwarding requests
- HALF_OPEN: Limited requests allowed to test service recovery
Algorithm
- Track requests and errors within a sliding time window
- When request count ≥
min_request_thresholdand error rate ≥error_ratio, open circuit breaker - After
max_breaker_sec, transition to half-open state - In half-open state, allow up to
permitted_number_of_calls_in_half_open_staterequests - If sufficient successful requests, close circuit breaker; otherwise, reopen
Backward Compatibility
This enhancement is fully backward compatible:
- Existing configurations continue to work without changes
- Default
policyis"unhealthy-count"(existing behavior) - No breaking changes to existing APIs
Testing
Comprehensive test coverage will be provided including:
- Schema validation tests for new parameters
- Functional tests for error ratio calculation
- Circuit breaker state transition tests
- Integration tests with various traffic patterns
- Backward compatibility tests
Use Cases
- High-traffic services: Better handling of error spikes in high-volume scenarios
- Variable traffic patterns: Adaptive behavior for services with fluctuating request rates
- Microservices architectures: More precise circuit breaking for service mesh environments
- SLA-based circuit breaking: Configure circuit breaker based on acceptable error rates
Files to be Modified
apisix/plugins/api-breaker.lua- Core plugin logict/plugin/api-breaker.t- Test cases (new test file for ratio-based tests)docs/en/latest/plugins/api-breaker.md- English documentationdocs/zh/latest/plugins/api-breaker.md- Chinese documentation
Additional Information
This feature has been implemented and tested locally. I'm ready to submit a PR with:
- Complete implementation of the error ratio-based circuit breaking
- Comprehensive test suite following APISIX testing standards
- Updated documentation in both English and Chinese
- Backward compatibility preservation
Would appreciate feedback on this proposal and guidance on the contribution process.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status