-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
At the moment the helm check treats any status that is not "failed" as OK.
Returns CRITICAL for a release when its latest revision is in failed state. Returns OK otherwise.
But helm can screw up and get stuck in a pending status (like "pending-update"). While it should temporarily go through this status while releasing it should not get stuck there.
We have run into situations where our automated releases have been failing because a helm release is stuck in pending-update but it is impossible to report on this in Datadog because all other helm metrics cover all revisions, not just the most current one, and non-OK statuses are expected in old revisions. The service check is the only way to get the current revision / state.
Can this check be amended to report a non-OK state when the current status is pending* for an excessive amount of time?