[bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination #36221

sobotklp · 2025-08-28T16:27:35Z

Description of the change

This is a fix for #23036

It implements graceful failover for terminations of Redis Cluster master pods. This will limit the impact of terminations and failover events, since Redis cluster clients will have the chance to update their topology before the previous pod terminates. This will improve uptime during planned maintenance events.

Benefits

Improved uptime by failing master pods over to other replicas.

Possible drawbacks

If PodDisruptionBudget isn't being used, it may initiate a failover to a node that is also about to also be terminated. The script will attempt to filter out replicas that aren't candidates for promotion.

Applicable issues

fixes [bitnami/redis-cluster] Request failover when a master node gracefully shuts down #23036

Additional information

This is an attempt to mirror the graceful failover behaviour of the redis and valkey charts when using Sentinel.

Checklist

Chart version bumped in Chart.yaml according to semver. This is not necessary when the changes only affect README.md files.
Variables are documented in the values.yaml and added to the README.md using readme-generator-for-helm
Title of the pull request follows this pattern [bitnami/<name_of_the_chart>] Descriptive title
All commits signed off and in agreement of Developer Certificate of Origin (DCO)

carrodher · 2025-08-28T16:44:00Z

Thank you for initiating this pull request. We appreciate your effort. This is just a friendly reminder that signing your commits is important. Your signature certifies that you either authored the patch or have the necessary rights to contribute to the changes. You can find detailed information on how to do this in the “Sign your work” section of our contributing guidelines.

Feel free to reach out if you have any questions or need assistance with the signing process.

…master nodes on pod termination Signed-off-by: Lewis Sobotkiewicz <[email protected]>

Signed-off-by: Bitnami Bot <[email protected]>

sobotklp · 2025-09-05T02:01:56Z

Thank you for initiating this pull request. We appreciate your effort. This is just a friendly reminder that signing your commits is important. Your signature certifies that you either authored the patch or have the necessary rights to contribute to the changes. You can find detailed information on how to do this in the “Sign your work” section of our contributing guidelines.

Feel free to reach out if you have any questions or need assistance with the signing process.

Hi @carrodher . Thanks for the tips on signing my commits. I've rebased my commit and amended my change with my signature. Let me know if I can do anything else. :)

Signed-off-by: Bitnami Bot <[email protected]>

Signed-off-by: Lewis Sobotkiewicz <[email protected]>

sobotklp · 2025-09-05T03:28:45Z

The script uses this basic logic on pod termination:

If the node is a master node, then
- Get the node ID
- List the available replicas for the node ID
- Select a replica from the list and attempt to call CLUSTER FAILOVER.

I tested this by deploying a 9-node, 3-shard cluster locally without authentication or TLS, then manually terminating master nodes.
I could see that the commands ROLE and CLUSTER MYID were called on the terminating master node, as expected.
On the target replicas, I saw the expected output in the logs:

 * Manual failover user request accepted.
 * Received replication offset for paused master manual failover: 1148
 * All master replication stream processed, manual failover can start.
 * Start of election delayed for 0 milliseconds (rank #0, offset 1148).
 * Starting a failover election for epoch 11.

sobotklp · 2025-09-05T03:32:30Z

The script uses this basic logic on shutdown:

If the node is a master node, then

Get the node ID

List the available replicas for the node ID

Select a replica from the list and attempt to call CLUSTER FAILOVER.

I tested this by deploying a 9-node, 3-shard cluster locally without authentication or TLS, then manually terminating master nodes. I could see that the commands ROLE and CLUSTER MYID were called on the terminating master node, as expected. On the target replicas, I saw the expected output in the logs:
 * Manual failover user request accepted.
 * Received replication offset for paused master manual failover: 1148
 * All master replication stream processed, manual failover can start.
 * Start of election delayed for 0 milliseconds (rank #0, offset 1148).
 * Starting a failover election for epoch 11.

It's also possible to kubectl exec -it -- bash into a master node and run the script manually, initiating a manual failover without terminating the pod.

xqianwang · 2025-09-05T20:14:36Z

bitnami/redis-cluster/templates/scripts-configmap.yaml

+        if [[ "$result" == "OK" ]]; then
+          {{- if .Values.cluster.redisShutdownWaitFailover }}
+          # Wait for clients to update their topology
+          sleep 10


Should we make this configurable instead of fixed 10 seconds?

it could be configurable up to a maximum of {{- $.Values.redis.terminationGracePeriodSeconds }} I would think.

I updated this to wait $.Values.redis.terminationGracePeriodSeconds - 10 seconds

xqianwang · 2025-09-05T20:16:19Z

bitnami/redis-cluster/templates/scripts-configmap.yaml

+      mapfile -t REPLICA_IPS < <( get_replica_ips )
+
+      NUM_REPLICAS=${#REPLICA_IPS[@]}
+      echo "Found $NUM_REPLICAS available replicas"


This should cover 0 replicas found. But maybe its better to add a warning message?
No replica found for failover; proceeding with shutdown

github-actions · 2025-09-21T01:28:20Z

This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.

Signed-off-by: Lewis Sobotkiewicz <[email protected]>

sobotklp · 2025-09-22T19:45:22Z

@migruiz4 would you like me to do anything more here to validate this PR?

Signed-off-by: Bitnami Bot <[email protected]>

github-actions bot added redis-cluster triage Triage is needed labels Aug 28, 2025

github-actions bot assigned carrodher Aug 28, 2025

github-actions bot requested a review from carrodher August 28, 2025 16:28

sobotklp force-pushed the redis-cluster-prestop branch 2 times, most recently from 75ff4fa to a83c273 Compare August 28, 2025 16:32

carrodher changed the title ~~[bitnami/redis-cluster]: add preStop hook that gracefully fails over …~~ [bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination Aug 28, 2025

sobotklp force-pushed the redis-cluster-prestop branch from ffefc85 to 6d09087 Compare August 28, 2025 16:58

sobotklp and others added 2 commits September 4, 2025 19:59

[bitnami/redis-cluster]: add preStop hook that gracefully fails over …

770594d

…master nodes on pod termination Signed-off-by: Lewis Sobotkiewicz <[email protected]>

Update CHANGELOG.md

59f9262

Signed-off-by: Bitnami Bot <[email protected]>

sobotklp force-pushed the redis-cluster-prestop branch from c0cbf9b to 59f9262 Compare September 5, 2025 02:00

bitnami-bot and others added 2 commits September 5, 2025 02:05

Update CHANGELOG.md

d22249a

Signed-off-by: Bitnami Bot <[email protected]>

Use correct path for restart script

942c0f6

Signed-off-by: Lewis Sobotkiewicz <[email protected]>

javsalgar added verify Execute verification workflow for these changes in-progress labels Sep 5, 2025

github-actions bot removed the triage Triage is needed label Sep 5, 2025

github-actions bot unassigned carrodher Sep 5, 2025

github-actions bot removed the request for review from carrodher September 5, 2025 06:53

github-actions bot assigned migruiz4 Sep 5, 2025

github-actions bot requested a review from migruiz4 September 5, 2025 06:53

xqianwang approved these changes Sep 5, 2025

View reviewed changes

github-actions bot added the stale 15 days without activity label Sep 21, 2025

Sleep for terminationGracePeriodSeconds-10 after failing over

1f3a3bb

Signed-off-by: Lewis Sobotkiewicz <[email protected]>

Update CHANGELOG.md

1f7ee6b

Signed-off-by: Bitnami Bot <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination #36221

[bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination #36221

sobotklp commented Aug 28, 2025 •

edited

Loading

Uh oh!

carrodher commented Aug 28, 2025

Uh oh!

sobotklp commented Sep 5, 2025

Uh oh!

sobotklp commented Sep 5, 2025 •

edited

Loading

Uh oh!

sobotklp commented Sep 5, 2025

Uh oh!

xqianwang Sep 5, 2025

Uh oh!

sobotklp Sep 5, 2025 •

edited

Loading

Uh oh!

sobotklp Sep 22, 2025

Uh oh!

xqianwang Sep 5, 2025

Uh oh!

github-actions bot commented Sep 21, 2025

Uh oh!

sobotklp commented Sep 22, 2025

Uh oh!

Uh oh!

[bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination #36221

Are you sure you want to change the base?

[bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination #36221

Conversation

sobotklp commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Benefits

Possible drawbacks

Applicable issues

Additional information

Checklist

Uh oh!

carrodher commented Aug 28, 2025

Uh oh!

sobotklp commented Sep 5, 2025

Uh oh!

sobotklp commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sobotklp commented Sep 5, 2025

Uh oh!

xqianwang Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

sobotklp Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sobotklp Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

xqianwang Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 21, 2025

Uh oh!

sobotklp commented Sep 22, 2025

Uh oh!

Uh oh!

sobotklp commented Aug 28, 2025 •

edited

Loading

sobotklp commented Sep 5, 2025 •

edited

Loading

sobotklp Sep 5, 2025 •

edited

Loading