-
Notifications
You must be signed in to change notification settings - Fork 10.2k
[bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination #36221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
75ff4fa
to
a83c273
Compare
Thank you for initiating this pull request. We appreciate your effort. This is just a friendly reminder that signing your commits is important. Your signature certifies that you either authored the patch or have the necessary rights to contribute to the changes. You can find detailed information on how to do this in the “Sign your work” section of our contributing guidelines. Feel free to reach out if you have any questions or need assistance with the signing process. |
ffefc85
to
6d09087
Compare
…master nodes on pod termination Signed-off-by: Lewis Sobotkiewicz <[email protected]>
Signed-off-by: Bitnami Bot <[email protected]>
c0cbf9b
to
59f9262
Compare
Hi @carrodher . Thanks for the tips on signing my commits. I've rebased my commit and amended my change with my signature. Let me know if I can do anything else. :) |
Signed-off-by: Bitnami Bot <[email protected]>
Signed-off-by: Lewis Sobotkiewicz <[email protected]>
The script uses this basic logic on pod termination:
I tested this by deploying a 9-node, 3-shard cluster locally without authentication or TLS, then manually terminating master nodes.
|
It's also possible to |
if [[ "$result" == "OK" ]]; then | ||
{{- if .Values.cluster.redisShutdownWaitFailover }} | ||
# Wait for clients to update their topology | ||
sleep 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this configurable instead of fixed 10 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be configurable up to a maximum of {{- $.Values.redis.terminationGracePeriodSeconds }}
I would think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this to wait $.Values.redis.terminationGracePeriodSeconds - 10
seconds
mapfile -t REPLICA_IPS < <( get_replica_ips ) | ||
NUM_REPLICAS=${#REPLICA_IPS[@]} | ||
echo "Found $NUM_REPLICAS available replicas" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should cover 0 replicas found. But maybe its better to add a warning message?
No replica found for failover; proceeding with shutdown
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution. |
Signed-off-by: Lewis Sobotkiewicz <[email protected]>
@migruiz4 would you like me to do anything more here to validate this PR? |
Signed-off-by: Bitnami Bot <[email protected]>
Description of the change
This is a fix for #23036
It implements graceful failover for terminations of Redis Cluster master pods. This will limit the impact of terminations and failover events, since Redis cluster clients will have the chance to update their topology before the previous pod terminates. This will improve uptime during planned maintenance events.
Benefits
Improved uptime by failing master pods over to other replicas.
Possible drawbacks
If
PodDisruptionBudget
isn't being used, it may initiate a failover to a node that is also about to also be terminated. The script will attempt to filter out replicas that aren't candidates for promotion.Applicable issues
Additional information
This is an attempt to mirror the graceful failover behaviour of the
redis
andvalkey
charts when using Sentinel.Checklist
Chart.yaml
according to semver. This is not necessary when the changes only affect README.md files.README.md
using readme-generator-for-helm