-
Notifications
You must be signed in to change notification settings - Fork 603
OCPEDGE-1502: feat: podman-etcd: add support for cert rotation #2085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2085/1/input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks quite good, and I've added a few comments. My main concern now is that the nodes can act on certificate change at the same time, which results in the etcd-cluster being disrupted until both recover.
First of all, is it expected that both nodes can update certificate at the same time? If so, we might want to synchronize them, e.g. with CIB attributes. There are a few examples in the code, look for attribute_learner_node
function for example.
0a83ec4
to
06fd7a2
Compare
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2085/2/input |
@clobrano I don't know about the CIB mechanism, looking into it now so I can port over the |
added a cert check function to the monitor call to force a restart of etcd when the certs have been changed Signed-off-by: ehila <[email protected]>
06fd7a2
to
6bfbe1d
Compare
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2085/3/input |
@clobrano I don't think we'll need the CIB, was able to validate this and the cert rotation is a rolling change, so the first node updates, then the second. I think we'll be adding extra complexity to have CIB in the mix. I haven't had a chance to look at the etcd-operator code yet, but I'll find this out so we can be sure that is the behavior. In my experimentation it was a rolling process from node to node. |
If that so, I agree, there is no need to add complexity.
does it mean the node also reboots, or that CEO just waits for one to be completed before moving to the next? If the first, we need no logic at all, but I remember we reproduced a scenario when the node does not reboot after a certificate change. If the latter, we should ensure the process is long enough for the agent to see (and signal) the file change, be stopped (which might require up to 80 seconds) and restart (which is usually much quicker), and rejoin the cluster, before the peer node is affected by the certificate change. |
added a cert check function to the monitor call to force a restart of etcd when the certs have been changed