(feat) - Add High Availability Support with Leader Election#147
Merged
InsomniaCoder merged 5 commits intomainfrom Oct 6, 2025
Merged
(feat) - Add High Availability Support with Leader Election#147InsomniaCoder merged 5 commits intomainfrom
InsomniaCoder merged 5 commits intomainfrom
Conversation
tjamet
approved these changes
Oct 2, 2025
tjamet
reviewed
Oct 2, 2025
InsomniaCoder
commented
Oct 2, 2025
| flag.StringVar(&schedulableArchs, "cluster-schedulable-archs", "", "Comma separated list of architectures schedulable in the cluster") | ||
| flag.StringVar(&systemOS, "system-os", "linux", "Sole OS supported by the system") | ||
| flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.") | ||
| flag.StringVar(&healthProbeAddr, "health-probe-addr", ":8081", "The address the health probe endpoint binds to.") |
Contributor
Author
There was a problem hiding this comment.
keeping this as flag to keep in consistent with metricsAddr
Contributor
Author
|
Tested and the reconciler runs in only leader while another pod does not have any logs from {"controller":"*controllers.PodReconciler"} but has webhook processing logs |
alfredolopezzz
approved these changes
Oct 6, 2025
Fsero
approved these changes
Oct 6, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem Statement
Noe controller currently runs as a single replica, If the controller pod fails, the webhook admission requests become unavailable.
This leads to a possible pod being misplaced when the noe pod is not available.
Proposal
Implemented a simplified High Availability architecture.
Webhook Component (Admission Controller)
Controller Component (Reconciler)
✅ Implemented health endpoints (/healthz, /readyz on port 8081)
✅ Updated Helm template with conditional replica count and anti-affinity
✅ Fixed 2 replicas for HA set up to simplify set up
✅ Automatic PodDisruptionBudget creation for HA deployments
✅ Added comprehensive documentation to README