-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Use Case
I have some web nodes acting as a load balancer front end to a pool of puppetserver compilers. I have all nodes set to auto-update, which means occasional restarts of compilers (service or host, depending on what patched), and that means breaking compile jobs. I'd love if there were a way to drop compiler nodes out of service by having the web front ends know the usability of a compiler node. A richer status than basic 'running' vs 'error', ala k8s startup / liveness / readiness checks.
Describe the solution you would like
I think some API could do more statuses here. States representative of situations like ...
- "I'm starting up, I'm alive enough to tell you this, but don't send me compile jobs yet."
- "I'm running and totally have room for more."
- "I'm running but I'm overloaded, maybe I'm not your best choice?"
- "I'm gracefully shutting down soon, wrapping up what I have already accepted. You shouldn't send anything new if you can avoid it." (implying a 'graceful shutdown' mode, dare to dream).
- "I'm actively shutting down now, any compiles are about to die, sorry."
... that a proxy health checker could pick up on, use for kicking sad servers out of the pool, and bringing them back in when they come back around.
Describe alternatives you've considered
I've considered time-based crons (this node reboots at HH:MM), but that coordination across servers and doesn't cover 'emergency / off-cycle patches'.
https://www.puppet.com/docs/puppet/8/server/status-api/v1/simple
This endpoint reports back running, error, and unknown, great for monitoring up/down, but not enough options for making loadbalancer decisions.
https://www.puppet.com/docs/puppet/8/server/status-api/v1/services
This endpoint has the 'starting' and 'stopping', but too many details for a naive/closer-to-http status parser. It's not reasonably regex matchable (because it's json, duh) and it returns results for many services instead of one for overall health. It could maybe work with a new parameter to pick up one service's status, ala /v1/services?level=critical&service=server... but at that point we're looped back to 'just make /v1/simple have more states.'
Additional context
No response