Skip to content

Add batch routing test file support with inhibition assertions to amtool config routes test #5167

@FRosner

Description

@FRosner

Problem

amtool currently provides two tools for validating alertmanager configuration:

  • amtool check-config validates syntax and structure. It cannot tell you whether an alert reaches the right receiver.
  • amtool config routes test tests a single alert against the routing tree interactively. It is useful for manual debugging but has two significant limitations:
    1. It does not support batch test files — you cannot define a suite of routing expectations and run them all at once.
    2. It has no notion of inhibition. You cannot assert that a warning alert is suppressed when a critical is firing.

The failure modes these gaps allow are silent and high-impact:

  1. Wrong receiver: SomeAlert with team=backend ends up in the frontend Slack channel because a route was added above the team-based route without continue: false. The config is syntactically valid; amtool check-config passes; the routing logic is just wrong.
  2. Broken inhibition: A warning fires even though a critical is active for the same alert, flooding an incident channel with noise. Or inhibition is over-broad and silences warnings that should not be suppressed.

Both are semantic errors that cannot be caught today without either manual testing against a live alertmanager instance or a separate out-of-tree tool.

Proposed Feature

Extend amtool config routes test with a --test-file flag that accepts a YAML file containing a list of named test cases. Each test case fires one or more alerts together and asserts per-alert expectations: the expected receiver list, or that the alert is inhibited.

Running all alerts in a test case together is the key property that makes inhibition testing possible: a severity=critical alert can suppress a severity=warning alert within the same case, mirroring how a live alertmanager would behave.

Exit code 0 if all cases pass; exit code 1 if any fail, making it CI-friendly by default.

Proposed YAML Format

tests:
  # Anything not matched by a specific route falls through to the default receiver.
  - name: "Unmatched alert routes to default receiver"
    alerts:
      - labels:
          alertname: SomeAlert
        expected_receivers:
          - default

  # Watchdog is a synthetic heartbeat alert. It must not page anyone.
  - name: "Watchdog alert routes to null receiver"
    alerts:
      - labels:
          alertname: Watchdog
          severity: critical
        expected_receivers:
          - "null"

  # Team-based routing.
  - name: "Team A alert routes to team-a-slack"
    alerts:
      - labels:
          alertname: TeamAAlert
          team: team-a
        expected_receivers:
          - team-a-slack

  # Inhibition: a critical suppresses a warning with the same alertname.
  # Both alerts are fired together so the inhibitor can evaluate the relationship.
  - name: "critical suppresses warning with same alertname"
    alerts:
      - labels:
          alertname: SomeAlert
          severity: critical
        expected_receivers:
          - default
      - labels:
          alertname: SomeAlert
          severity: warning
        expected_inhibited: true

  # Inhibition boundary: a critical for AlertOne does NOT suppress a warning
  # for AlertTwo because the inhibit rule requires equal alertname.
  - name: "critical does NOT suppress warning with different alertname"
    alerts:
      - labels:
          alertname: AlertOne
          severity: critical
        expected_receivers:
          - default
      - labels:
          alertname: AlertTwo
          severity: warning
        expected_receivers:
          - default

Fields:

  • expected_receivers: ordered list of receiver names the alert must match. Order matters because alertmanager's routing order is significant when continue: true is used.
  • expected_inhibited: set to true to assert the alert is suppressed. Omit (or leave false) otherwise. Do not set both on the same alert.

Expected CLI Output

  PASS Unmatched alert routes to default receiver
  PASS Watchdog alert routes to null receiver
  PASS Team A alert routes to team-a-slack
  PASS critical suppresses warning with same alertname
  FAIL "critical does NOT suppress warning with different alertname"
       alert {alertname=AlertTwo, severity=warning}:
         expected: default
         actual:   (inhibited)
=== routing tests: 4 passed, 1 failed ===

Implementation Notes

Routing already works via the existing amtool code path. dispatch.NewRoute(cfg.Route, nil).Match(labelSet) returns the same receiver list a live alertmanager would produce.

Inhibition requires a minimal in-memory provider.Alerts implementation. The inhibitor is designed to work against a live alert store; the workaround is a fakeAlerts struct that serves a fixed set of alerts from a buffered channel:

func (f *fakeAlerts) Subscribe() provider.AlertIterator {
    ch := make(chan *types.Alert, len(f.alerts))
    for _, a := range f.alerts {
        ch <- a
    }
    done := make(chan struct{})
    return provider.NewAlertIterator(ch, done, nil)
}

The inhibitor is constructed with this fake provider, its Run() goroutine is started, and after a brief settle for it to process the alert feed, Mutes(labelSet) is called for each alert to check whether it is suppressed.

Test runner loop:

  1. For each test case, collect all alert label sets.
  2. Construct a fakeAlerts provider with all alerts in the case.
  3. Start the inhibitor with Run().
  4. For each alert in the case: check inhibitor.Mutes(labels) first; if not inhibited, call dispatch.NewRoute(...).Match(labels).
  5. Compare results against assertions; record PASS or FAIL.
  6. Tear down the inhibitor.

Inhibition is checked before receiver matching. Matching an inhibited alert to receivers is undefined behavior in a real alertmanager, so the test should assert inhibition explicitly via expected_inhibited: true.

For full context on this approach, see: https://dev.to/frosnerd/unit-testing-alertmanager-routing-and-inhibition-rules-1hj4

Relationship to Existing Issues

No existing issue covers batch test files together with inhibition assertions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions