Skip to content

[helm] Report when helm releases get stuck in pending states #21216

@damonmaria

Description

@damonmaria

At the moment the helm check treats any status that is not "failed" as OK.

Returns CRITICAL for a release when its latest revision is in failed state. Returns OK otherwise.

But helm can screw up and get stuck in a pending status (like "pending-update"). While it should temporarily go through this status while releasing it should not get stuck there.

We have run into situations where our automated releases have been failing because a helm release is stuck in pending-update but it is impossible to report on this in Datadog because all other helm metrics cover all revisions, not just the most current one, and non-OK statuses are expected in old revisions. The service check is the only way to get the current revision / state.

Can this check be amended to report a non-OK state when the current status is pending* for an excessive amount of time?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions