[DO NOT MERGE] Revert "[TT-9234] regression fixes for failing mdcb readiness check (… #7231

lghiur · 2025-07-11T12:46:15Z

PR Type

Bug fix, Tests

Description

Reverts logic for RPC critical failure to previous behavior
Removes emergency mode check for RPC in health check
Deletes related test cases for RPC emergency mode scenarios
Simplifies isCriticalFailure and its test coverage

Changes diagram

flowchart LR
  A["isCriticalFailure logic"] -- "Remove emergency mode check for RPC" --> B["Simplified RPC critical check"]
  B -- "Update tests" --> C["Remove emergency mode test cases"]

Changes walkthrough 📝

Relevant files

Bug fix

health_check.go `Remove emergency mode check from RPC critical failure logic` gateway/health_check.go Removes emergency mode check for RPC in `isCriticalFailure` Restores previous logic for determining RPC as critical	+2/-2

Tests

health_check_test.go `Remove and simplify tests for RPC emergency mode logic` gateway/health_check_test.go Deletes test cases for RPC emergency mode scenarios Simplifies test table for `isCriticalFailure` Removes dependency on `rpc` package in tests	+0/-38

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

…7215)" This reverts commit a564981.

buger · 2025-07-11T12:46:44Z

💔 The detected issue is not in one of the allowed statuses 💔

Detected Status	DoD Check	❌
Allowed Statuses	In Dev,In Code Review,Ready for Testing,In Test,In Progress,In Review	✔️

Please ensure your jira story is in one of the allowed statuses

github-actions · 2025-07-11T12:47:08Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Logic Change The logic for determining if the RPC component is a critical failure has been reverted to ignore the emergency mode check. Reviewers should validate that this change aligns with the intended system behavior and does not reintroduce previously fixed issues. // Consider RPC critical only if using RPC if component == "rpc" && gw.GetConfig().Policies.PolicySource == "rpc" { return true } Test Coverage Reduction Test cases for RPC emergency mode scenarios have been removed. Ensure that the remaining tests still provide adequate coverage for the critical failure logic, especially for edge cases involving the RPC component. Status: Warn, ComponentType: System, }, setupConfig: func(conf config.Config) { conf.UseDBAppConfigs = true }, expectedResult: true, // Critical based on component and config, not status }, } for _, tt := range tests { t.Run(tt.name, func(t testing.T) { // Create a new gateway instance for each test conf := config.Config{} tt.setupConfig(&conf) gw := NewGateway(conf, nil) // Call the function under test result := gw.isCriticalFailure(tt.component)

github-actions · 2025-07-11T12:47:17Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Restore emergency mode check for RPC Removing the emergency mode check may cause the system to treat the `rpc` component as critical even during emergency mode, potentially leading to unnecessary failures. Restore the emergency mode condition to avoid marking `rpc` as critical when in emergency mode. gateway/health_check.go [304-307] -// Consider RPC critical only if using RPC -if component == "rpc" && gw.GetConfig().Policies.PolicySource == "rpc" { +// Consider RPC critical only if using RPC and not in emergency mode +if component == "rpc" && gw.GetConfig().Policies.PolicySource == "rpc" && !rpc.IsEmergencyMode() { return true } Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies that removing the emergency mode check could cause the system to treat the `rpc` component as critical even during emergency mode, which may not be the intended behavior. Restoring this check is important for correct system failover logic and prevents unnecessary failures.	Medium

github-actions · 2025-07-11T12:48:13Z

API Changes

no api changes detected

github-actions · 2025-07-13T10:11:10Z

Dependency Impact Analysis: RPC Emergency Mode in Health Checks

This PR reverts a change to how RPC (Remote Procedure Call) failures are evaluated in Tyk Gateway's health check system. The change specifically affects MDCB (Multi Data Center Bridge) deployments and their readiness checks.

Key Changes

Simplified RPC Critical Failure Logic:
- Before: RPC was considered a critical failure only if the policy source was RPC AND the system was NOT in emergency mode
- After: RPC is considered a critical failure if the policy source is RPC, regardless of emergency mode
Test Simplification:
- Removed test cases specifically for RPC emergency mode scenarios
- Removed dependency on the rpc package in the health check tests
- Simplified the test table for isCriticalFailure

Technical Impact Analysis

Emergency Mode Context

In Tyk Gateway, "emergency mode" is a resilience mechanism that activates when the RPC connection to MDCB fails. When in emergency mode, the system falls back to using cached data instead of trying to fetch fresh data from the RPC server, allowing the gateway to continue functioning during RPC outages.

Health Check Behavior Change

The change modifies how health checks report the system status when RPC is down:

Previous behavior: When in emergency mode, RPC failures would NOT be considered critical, so health checks would still pass (return 200 OK)
New behavior: RPC failures are ALWAYS considered critical if RPC is the policy source, regardless of emergency mode (will return 503 Service Unavailable)

MDCB Readiness Check Impact

This change specifically addresses a regression issue with MDCB readiness checks (TT-9234). The readiness endpoint (/ready) is used by orchestration systems like Kubernetes to determine if a pod is ready to receive traffic.

By removing the emergency mode condition, the readiness check will more accurately reflect the actual state of the RPC connection, which is important for proper cluster management in MDCB deployments. This ensures that systems relying on these health checks get accurate information about the gateway's connectivity status.

Dependency Impact

This change affects systems that rely on Tyk Gateway's health/readiness checks:

Kubernetes/Orchestration Systems: Services that route traffic based on health check results will now see RPC failures as critical, even during emergency mode. This may affect pod availability reporting.
Monitoring Systems: Alerts may trigger for RPC failures even when the system is operating in its designed fallback state.
MDCB Deployments: The PR specifically fixes issues with readiness checks in MDCB scenarios, ensuring more accurate reporting of the gateway's connection status.

Conclusion

This change represents a trade-off between nuanced health reporting and accurate connection status reporting. While the gateway can continue to function in emergency mode with cached data, the health check will now report the actual RPC connection status more accurately, which is important for proper orchestration in MDCB deployments.

The simplification makes the health check behavior more straightforward and consistent, at the cost of potentially reporting the gateway as unhealthy even when it's operating in a valid fallback state.

Tip: Mention me again using /dependency <request>.
Powered by Probe AI\nTyk Gateway Dependency Impact Reviewer

github-actions · 2025-07-13T10:12:28Z

PR Analysis: Revert RPC Emergency Mode Check in Health Check

This PR reverts changes related to how the Tyk gateway handles RPC critical failures in health checks. The changes are focused on fixing regression issues with MDCB (Multi Data Center Bridge) readiness checks.

Changes Overview

In gateway/health_check.go:

Removes the emergency mode check from the RPC critical failure logic

The condition is simplified from:

// Consider RPC critical only if using RPC and gw not in emergency mode
if component == "rpc" && gw.GetConfig().Policies.PolicySource == "rpc" && !rpc.IsEmergencyMode() {

to:

// Consider RPC critical only if using RPC
if component == "rpc" && gw.GetConfig().Policies.PolicySource == "rpc" {

In gateway/health_check_test.go:
- Removes the import of the rpc package
- Removes the setupFunc field from the test struct which was used to set up emergency mode
- Deletes two test cases that were specifically testing RPC emergency mode scenarios:
  - "rpc component is NOT critical when PolicySource is rpc but in emergency mode"
  - "rpc component is critical when PolicySource is rpc and NOT in emergency mode"
- Removes the code that would call setupFunc in the test

Technical Context

Emergency Mode

Emergency mode is a flag that is set when the RPC connection is down and the gateway is configured to use the local cache. The IsEmergencyMode() function in the rpc package returns this state.

MDCB (Multi Data Center Bridge)

MDCB is a component of Tyk that allows for multi-data center deployments. It relies on RPC for communication between data centers.

Readiness Check

The readiness check is a dedicated endpoint that verifies if the gateway is ready to serve requests by checking:

Redis connection status
Successful API reload
Other critical components

Unlike the liveness check, the readiness check returns a 503 Service Unavailable status if the gateway is not ready to serve requests.

Impact Analysis

This change reverts to a simpler behavior where RPC is considered a critical component whenever the policy source is set to "rpc", regardless of whether the system is in emergency mode or not.

Problem Being Fixed

The previous implementation was causing issues with MDCB readiness checks. When the system entered emergency mode (due to RPC connection issues), the health check would not consider RPC as critical, which could lead to misleading health status reporting.

With the emergency mode check in place, if RPC was down but the system was in emergency mode, the health check would report that everything was fine (since RPC wasn't considered critical in emergency mode). This could mask actual issues with the RPC connection that needed attention.

New Behavior

With this change:

If RPC is configured as the policy source, it will always be considered critical
If RPC fails, it will always be considered a critical failure when RPC is the policy source
The readiness check will now fail appropriately when RPC is down, regardless of emergency mode

This ensures that the health check accurately reflects the state of the RPC connection, which is particularly important for MDCB deployments where RPC is a critical component for communication between data centers.

Conclusion

This PR simplifies the RPC critical failure logic by removing the emergency mode condition, which should fix the regression issues with MDCB readiness checks. The change is well-tested with the removal of test cases that are no longer relevant after this change.

The PR title indicates this is a reversion of a previous change that introduced the emergency mode check, suggesting that the original behavior (without the emergency mode check) was more reliable for MDCB readiness checks.

Tip: Mention me again using /performance <request>.
Powered by Probe AI\nPerformance Impact Reviewer Prompt

github-actions · 2025-07-13T10:12:51Z

Analysis of PR #7231: Revert RPC Emergency Mode Check in Health Checks

Overview

This PR reverts changes related to RPC emergency mode handling in the health check system. The key changes are:

Removal of the emergency mode check for RPC in the isCriticalFailure function
Simplification of related test cases

Detailed Changes

1. Modification to `isCriticalFailure` Function

The PR reverts the RPC critical failure logic to its previous behavior by removing the emergency mode check:

Before the PR (current implementation):

// Consider RPC critical only if using RPC and not in emergency mode
if component == "rpc" && gw.GetConfig().Policies.PolicySource == "rpc" && !rpc.IsEmergencyMode() {
    return true
}

After the PR (reverted to previous behavior):

// Consider RPC critical only if using RPC
if component == "rpc" && gw.GetConfig().Policies.PolicySource == "rpc" {
    return true
}

This change means that RPC will be considered a critical failure if it's being used as the policy source, regardless of whether it's in emergency mode or not.

2. Test Case Simplification

The PR also removes test cases related to RPC emergency mode scenarios in health_check_test.go, simplifying the test coverage for the isCriticalFailure function. The removed test cases specifically tested the behavior when RPC was in emergency mode.

Understanding the Context

Emergency Mode in Tyk

In Tyk, "emergency mode" is a fallback mechanism for RPC connections. When the RPC connection fails, the system can enter emergency mode, which allows it to continue operating using cached data. This is particularly important for MDCB (Multi Data Center Bridge) deployments.

The emergency mode is managed by the rpc package and is used in various parts of the codebase to handle RPC failures gracefully. For example:

When loading policies from RPC, if in emergency mode, it loads from a backup
When checking session details, it handles emergency mode differently
During resource synchronization, emergency mode can be triggered if all retries fail

Health Check System

The health check system in Tyk monitors various components and reports their status. The isCriticalFailure function determines whether a component failure should be considered critical, which affects the overall health status reported by the system.

The health check is used for both liveness and readiness probes, which are important for Kubernetes deployments to determine if a pod is healthy and ready to serve traffic.

Impact Analysis

Functionality Changes

Stricter Health Checks: By removing the emergency mode check, the system will now report RPC failures as critical even when in emergency mode. This means the health check will report a failure even when the system could potentially still operate using cached data.
MDCB Readiness Check: The PR title mentions "regression fixes for failing mdcb readiness check", suggesting that the previous implementation was causing issues with MDCB readiness checks. The change ensures that RPC failures are consistently treated as critical when RPC is the policy source.
Simplified Logic: The change simplifies the failure detection logic, making it more straightforward and easier to understand.

Potential Concerns

False Positives: This change might lead to more false positive critical failures being reported, as RPC will be considered critical even when the system could operate in emergency mode.
Emergency Mode Behavior: The emergency mode in RPC is a state where the RPC connection is not functioning properly, but the system can still operate using cached data. By removing this check, the health check system will not take this into account.

Conclusion

This PR reverts to a simpler, more conservative approach to health checks for RPC components. While it might result in more stringent failure reporting, it ensures that any RPC issues are properly flagged as critical when RPC is being used as the policy source. This change appears to be addressing specific regression issues with MDCB readiness checks.

The PR is marked as "DO NOT MERGE" in the title, suggesting it might be a temporary fix or under review, and not intended for immediate merging into the main codebase. This could indicate that a more comprehensive solution is being developed, or that further testing is needed before this change can be safely merged.

Tip: Mention me again using /connectivity <request>.
Powered by Probe AI\nConnectivity Issues Reviewer Prompt for Tyk Gateway

github-actions · 2025-07-13T10:13:23Z

Analysis of PR #7231: Revert RPC Emergency Mode Check in Health Check

This PR reverts changes related to how the Tyk gateway determines if RPC is a critical component during health checks. The changes specifically affect the MDCB (Multi Data Center Bridge) readiness check functionality.

Key Changes

1. Simplified RPC Critical Failure Logic

The PR removes the emergency mode check from the RPC critical failure determination:

Before:

// Consider RPC critical only if using RPC and gw not in emergency mode
if component == "rpc" && gw.GetConfig().Policies.PolicySource == "rpc" && !rpc.IsEmergencyMode() {
    return true
}

After:

// Consider RPC critical only if using RPC
if component == "rpc" && gw.GetConfig().Policies.PolicySource == "rpc" {
    return true
}

2. Simplified Test Coverage

The PR significantly simplifies the test code:

Removes the rpc package import as it's no longer needed
Removes test cases specifically for RPC emergency mode scenarios
Simplifies the test table structure for isCriticalFailure
Renames setUpConfig to setupConfig for consistency
Removes the setupRPC and setupFunc fields that were used to set up emergency mode

3. Technical Impact

Emergency Mode Behavior

The previous implementation would not consider RPC as critical when in emergency mode, even if the policy source was set to "rpc". This PR changes that behavior so that RPC is always considered critical when the policy source is "rpc", regardless of emergency mode.

MDCB Readiness Check

The PR title mentions "regression fixes for failing mdcb readiness check", indicating that the previous behavior was causing issues with MDCB readiness checks. By reverting to the simpler logic, the PR aims to fix these regression issues.

Analysis

Rationale

The emergency mode in Tyk is a fallback mechanism that activates when RPC connections fail, allowing the gateway to operate with locally cached data. The previous implementation likely aimed to prevent the readiness check from failing when in emergency mode, as RPC would naturally be unavailable in that state.

However, this appears to have caused regression issues with MDCB readiness checks. The PR reverts to a simpler approach where RPC is always considered critical when configured as the policy source, which may be more appropriate for ensuring system integrity in distributed deployments.

Potential Concerns

By removing the emergency mode check, the system will now treat RPC as critical even during emergency mode. This could potentially lead to readiness checks failing when RPC is unavailable, even though the system is designed to operate in emergency mode with local backups.

Benefits

Simpler, more straightforward logic for determining critical components
Fixes regression issues with MDCB readiness checks
Removes dependency on the rpc package in tests, making them more isolated

Conclusion

This PR represents a pragmatic approach to fixing regression issues by reverting to a simpler implementation. While it might seem counterintuitive to consider RPC critical even during emergency mode (when RPC is known to be unavailable), this approach likely provides more consistent behavior for MDCB deployments.

The PR title's "DO NOT MERGE" prefix suggests this might be a temporary fix or under evaluation, possibly to be replaced with a more comprehensive solution in the future.

Tip: Mention me again using /security <request>.
Powered by Probe AI\nSecurity Impact Reviewer Prompt

sonarqubecloud · 2025-07-13T10:28:47Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Revert "[TT-9234] regression fixes for failing mdcb readiness check (#…

e8d8ed0

…7215)" This reverts commit a564981.

buger added Story Tyk Product Development labels Jul 11, 2025

github-actions bot added the Review effort 2/5 label Jul 11, 2025

Merge branch 'master' into test-prompt

e41d7f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO NOT MERGE] Revert "[TT-9234] regression fixes for failing mdcb readiness check (… #7231

[DO NOT MERGE] Revert "[TT-9234] regression fixes for failing mdcb readiness check (… #7231

Uh oh!

lghiur commented Jul 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

buger commented Jul 11, 2025

Uh oh!

github-actions bot commented Jul 11, 2025

Uh oh!

github-actions bot commented Jul 11, 2025

Uh oh!

github-actions bot commented Jul 11, 2025

Uh oh!

github-actions bot commented Jul 13, 2025

Uh oh!

github-actions bot commented Jul 13, 2025

Uh oh!

github-actions bot commented Jul 13, 2025

Uh oh!

github-actions bot commented Jul 13, 2025

Uh oh!

sonarqubecloud bot commented Jul 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[DO NOT MERGE] Revert "[TT-9234] regression fixes for failing mdcb readiness check (… #7231

Are you sure you want to change the base?

[DO NOT MERGE] Revert "[TT-9234] regression fixes for failing mdcb readiness check (… #7231

Uh oh!

Conversation

lghiur commented Jul 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Changes diagram

Changes walkthrough 📝

Uh oh!

buger commented Jul 11, 2025

Uh oh!

github-actions bot commented Jul 11, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Jul 11, 2025

PR Code Suggestions ✨

Uh oh!

github-actions bot commented Jul 11, 2025

Uh oh!

github-actions bot commented Jul 13, 2025

Dependency Impact Analysis: RPC Emergency Mode in Health Checks

Key Changes

Technical Impact Analysis

Emergency Mode Context

Health Check Behavior Change

MDCB Readiness Check Impact

Dependency Impact

Conclusion

Uh oh!

github-actions bot commented Jul 13, 2025

PR Analysis: Revert RPC Emergency Mode Check in Health Check

Changes Overview

Technical Context

Emergency Mode

MDCB (Multi Data Center Bridge)

Readiness Check

Impact Analysis

Problem Being Fixed

New Behavior

Conclusion

Uh oh!

github-actions bot commented Jul 13, 2025

Analysis of PR #7231: Revert RPC Emergency Mode Check in Health Checks

Overview

Detailed Changes

1. Modification to isCriticalFailure Function

2. Test Case Simplification

Understanding the Context

Emergency Mode in Tyk

Health Check System

Impact Analysis

Functionality Changes

Potential Concerns

Conclusion

Uh oh!

github-actions bot commented Jul 13, 2025

Analysis of PR #7231: Revert RPC Emergency Mode Check in Health Check

Key Changes

1. Simplified RPC Critical Failure Logic

2. Simplified Test Coverage

3. Technical Impact

Emergency Mode Behavior

MDCB Readiness Check

Analysis

Rationale

Potential Concerns

Benefits

Conclusion

Uh oh!

sonarqubecloud bot commented Jul 13, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

lghiur commented Jul 11, 2025 •

edited by github-actions bot

Loading

1. Modification to `isCriticalFailure` Function