Skip to content

Commit 836f3d4

Browse files
committed
feat: Add cleanup helm hook
Signed-off-by: Jonathan Stacks <[email protected]>
1 parent f982be0 commit 836f3d4

File tree

9 files changed

+731
-0
lines changed

9 files changed

+731
-0
lines changed

docs/cleanup-hook.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Cleanup Hook
2+
3+
## Overview
4+
5+
The ngrok-operator includes an optional Helm pre-delete hook that ensures proper cleanup of ngrok resources when
6+
uninstalling the operator. Without this hook, the operator's deployment and pods may be deleted before they can
7+
clean up ngrok API resources, leaving orphaned resources in your ngrok account.
8+
9+
## How It Works
10+
11+
When `helm uninstall` is executed, the cleanup hook:
12+
13+
1. **Runs before deletion**: As a `pre-delete` hook, it executes before any operator resources are removed
14+
2. **Annotates resources**: Adds `k8s.ngrok.com/cleanup=true` to all Kubernetes resources managed by the operator that have the ngrok finalizer
15+
3. **Processes in order**:
16+
- Gateway API Routes (HTTPRoute, TCPRoute, TLSRoute)
17+
- Core resources (Ingress, Service)
18+
- Gateway API Gateways
19+
- ngrok CRDs (CloudEndpoint, AgentEndpoint, Domain, IPPolicy, etc.)
20+
4. **Waits for cleanup**: Monitors each resource until the operator removes the finalizer, indicating cleanup is complete
21+
5. **Retries on failure**: Automatically retries operations if they fail
22+
23+
This ensures the operator managers stay running long enough to properly clean up all ngrok resources before the operator itself is removed.
24+
25+
## Configuration
26+
27+
The cleanup hook is configured via Helm values:
28+
29+
```yaml
30+
cleanupHook:
31+
# Enable or disable the cleanup hook
32+
enabled: true
33+
34+
# Maximum time to wait for all resources to be cleaned up
35+
timeout: 300s # 5 minutes
36+
37+
# Number of times to retry on failure
38+
retries: 3
39+
40+
# Time to wait between retries
41+
retryInterval: 10s
42+
43+
# Resource requests/limits for the cleanup job pod
44+
resources:
45+
limits:
46+
cpu: 100m
47+
memory: 128Mi
48+
requests:
49+
cpu: 50m
50+
memory: 64Mi
51+
```
52+
53+
## When to Disable
54+
55+
You may want to disable the cleanup hook if:
56+
57+
- You want to retain ngrok resources after uninstalling the operator
58+
- You're troubleshooting and need to prevent automatic cleanup
59+
- You have a custom cleanup process
60+
61+
To disable:
62+
63+
```bash
64+
helm install ngrok-operator ngrok/ngrok-operator \
65+
--set cleanupHook.enabled=false
66+
```
67+
68+
## Troubleshooting
69+
70+
### Timeout Issues
71+
72+
If the cleanup hook times out, you can increase the timeout:
73+
74+
```yaml
75+
cleanupHook:
76+
timeout: 600s # 10 minutes
77+
```
78+
79+
### Hook Failures
80+
81+
Check the cleanup job logs:
82+
83+
```bash
84+
kubectl logs -n ngrok-operator job/ngrok-operator-cleanup
85+
```
86+
87+
### Manual Cleanup
88+
89+
If the hook fails, you can manually trigger cleanup by annotating resources:
90+
91+
```bash
92+
# Annotate a specific ingress
93+
kubectl annotate ingress my-ingress k8s.ngrok.com/cleanup=true
94+
95+
# Annotate all services with the ngrok finalizer
96+
kubectl get svc -A -o json | \
97+
jq -r '.items[] | select(.metadata.finalizers[] | contains("k8s.ngrok.com/finalizer")) | "\(.metadata.namespace)/\(.metadata.name)"' | \
98+
xargs -I {} kubectl annotate svc {} k8s.ngrok.com/cleanup=true
99+
```
100+
101+
## Resource Processing Order
102+
103+
The hook processes resources in a specific order to handle dependencies:
104+
105+
1. **Routes first**: Gateway API routes are cleaned up before their parent gateways
106+
2. **Core resources**: Ingress and Service resources that create ngrok CRDs
107+
3. **Gateways**: After routes are cleaned up
108+
4. **ngrok CRDs**: Finally, any remaining operator-managed custom resources
109+
110+
This ordering ensures that dependent resources are cleaned up before their dependencies, preventing validation errors.
111+
112+
## Permissions
113+
114+
The cleanup hook requires cluster-wide permissions to:
115+
- List and update all resource types managed by the operator
116+
- Check if optional CRDs (like Gateway API) exist
117+
118+
These permissions are automatically granted via the `ngrok-operator-cleanup` ClusterRole, which is created as part of the hook.
119+
120+
## Implementation Details
121+
122+
The cleanup hook is implemented as:
123+
- A Kubernetes Job with `helm.sh/hook: pre-delete`
124+
- Uses a bash script (stored in ConfigMap) with kubectl to annotate and monitor resources
125+
- Runs in a minimal `bitnami/kubectl` container image
126+
- ServiceAccount with ClusterRole permissions
127+
- Automatic cleanup via `helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded`
128+
129+
The bash script is stored at `helm/ngrok-operator/scripts/cleanup.sh` and can be reviewed or customized before installation.
130+
131+
The hook will be automatically removed after successful completion or before the next hook execution.

helm/ngrok-operator/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,3 +173,14 @@ To uninstall the chart:
173173
| `bindings.forwarder.tolerations` | Tolerations for the bindings forwarder pod(s) | `[]` |
174174
| `bindings.forwarder.nodeSelector` | Node labels for the bindings forwarder pod(s) | `{}` |
175175
| `bindings.forwarder.topologySpreadConstraints` | Topology Spread Constraints for the bindings forwarder pod(s) | `[]` |
176+
177+
### Cleanup Hook Configuration
178+
179+
| Name | Description | Value |
180+
| -------------------------------- | -------------------------------------------- | ------- |
181+
| `cleanupHook.enabled` | Enable the pre-delete cleanup hook | `false` |
182+
| `cleanupHook.timeout` | Maximum time to wait for cleanup to complete | `300` |
183+
| `cleanupHook.retries` | Number of times to retry on failure | `3` |
184+
| `cleanupHook.retryInterval` | Time to wait between retries | `10` |
185+
| `cleanupHook.resources.limits` | The resources limits for the container | `{}` |
186+
| `cleanupHook.resources.requests` | The requested resources for the container | `{}` |

0 commit comments

Comments
 (0)