-
Notifications
You must be signed in to change notification settings - Fork 36
feat: helm pre-delete hook #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jonstacks
wants to merge
9
commits into
main
Choose a base branch
from
stacks/helm-delete-hook
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Jonathan Stacks <[email protected]>
Signed-off-by: Jonathan Stacks <[email protected]>
…on to base controller and controller event handler Signed-off-by: Jonathan Stacks <[email protected]>
…n gateway resource Signed-off-by: Jonathan Stacks <[email protected]>
…inding resources Signed-off-by: Jonathan Stacks <[email protected]>
…perator Signed-off-by: Jonathan Stacks <[email protected]>
Signed-off-by: Jonathan Stacks <[email protected]>
195cd46 to
836f3d4
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #709 +/- ##
==========================================
+ Coverage 48.53% 48.98% +0.45%
==========================================
Files 95 95
Lines 10500 10575 +75
==========================================
+ Hits 5096 5180 +84
+ Misses 5049 5024 -25
- Partials 355 371 +16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/controller
Issues dealing with the controller
area/helm-chart
Issues dealing with the helm chart
documentation
Improvements or additions to documentation
size/XXL
Denotes a PR that changes 1000+ lines
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Relates to:
What
Background
The ngrok-operator watches
Service[type=Loadbalancer],Ingress, Gateway API resources, and ngrok CRD types. When it creates resources in the ngrok API for these, it adds the ngrok finalizer to these resources to make sure that those ngrok resources get cleaned up on deletion of the resources.These controllers are responsible for removing the finalizers from the resources they manage after cleaning up the ngrok resources.
When the manager pods start terminating, their context is cancelled and they stop reconciling resources. Once these manager pods are terminated they can no longer cleanup resources or remove the finalizers.
Problem
When uninstalling the helm chart, there is a race condition where the manager pods are deleted before they have time to clean up the resources they owned. This leaves resources in the cluster with the ngrok finalizer present, but no way remove the finalizer. This makes the uninstall fail and wedges these resources in a bad state.
What is more, is that if you you run helm uninstall and some of the resources are cleaned up, but others like the
KubernetesOperatordon't get processed in time, when you try to re-install, the new manager pods will clean up the finalizer from the existing resources and then the CRD will be deleted after install, leaving you in a bad state. This is what may be happening happening in #597.How
Helm Hook
This PR adds a helm pre-delete hook, initially disabled by default in the chart while we continue to test it. The pre-delete hook installs a cluster role to give the helm delete hook permissions to find resources with the ngrok finalizer and then annotation those resources with a
k8s.ngrok.com/cleanup: "true"annotation.k8s.ngrok.com/cleanup: "true"annotation support for controllersThis PR adds support for handling a
k8s.ngrok.com/cleanup: "true"annotation to our controllers. When objects are annotated with this annotation:Alternatives considered
I was contemplating just triggering deletes for all of the resources that have our finalizer and letting our controllers handle the cleanup, removing the annotation, and then letting them be garbage collected. The reason I didn't opt for this approach is that we don't create Ingress, Service, and Gateway API resources. They are created by users. It feels unexpected for us to delete them since we didn't create them.
Additional Concerns
This is currently only implemented as a helm pre-delete hook. This does not cover the case where the manager deployments or ngrok-operator namespace is deleted directly. That case is difficult to handle. When the namespace id deleted, it starts deleting all namespaced resources including the service account which grants the managers permissions to remove the finalizers. It also terminates the manager pods which prevent finalizers on resources in different namespaces from being cleaned up.
I have some other thoughts on how we can handle some of these cases that will make use of the annotation work, but need to do more testing. This also only works when using helm since this hook isn't supported by other tools like ArgoCD, for example. I think the annotation work can be kept, but we might need to supplement the pre-delete hook with other methods to make uninstall cleanup easier.
Breaking Changes
There should be no breaking changes. This feature is disabled by default currently in the helm chart and has to be enabled with
cleanupHook.enabled=true(We do enable it for our devenv to test it). Nothing in the codebase annotates these currently, so those paths should be unaffected.