Skip to content

add more comprehensive drain reporting tool #7189

@grondo

Description

@grondo

Problem: flux resource drain only reports current state of drained resources, but does not include historical drain events nor undrain reasons, though these are available in the resource eventlog.

Some admins have been using a one-off tool drain-report.py, which reports the history of all drain/undrain events in the eventlog (optionally for a set of targets), though it does not report undrain reasons:

$ flux python drain-report.py tuolumne1005
      NODE      T_DRAIN    T_UNDRAIN REASON                                  
tuolumne1005  Sep16 10:00  Sep16 10:47 testing -Kalan                          
tuolumne1005  Sep17 10:33  Sep17 17:52 PY: rebooting to pick up  BIOS   
tuolumne1005  Sep26 12:56  Sep26 18:21 quick drain for update -Kalan           
...

@kkier noted he had some improvements he'd like to make to this simple tool (please feel free to add them to this issue, or alternately just give your specifications for the features of a useful tool)

@kkier also suggested a tool that would report the current drain/undrain state of a set of targets when more than one target is given, e.g.:

TIMESTAMP     STATE          REASON                     NODELIST
Nov 06 08:00   drained       epilog failure XXXX        foo[1-10]
Nov 05 10:00   undrained     problem fixed              foo25
-              undrained     -                          foo[11-24,26-100]

Or similar. If a single target is specified, then it would give the entire history, including drain and undrain events, for that target.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions